Re: [PATCH bpf] powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH

2024-05-07 Thread Michael Ellerman
Puranjay Mohan  writes:
> The Linux Kernel Memory Model [1][2] requires RMW operations that have a
> return value to be fully ordered.
>
> BPF atomic operations with BPF_FETCH (including BPF_XCHG and
> BPF_CMPXCHG) return a value back so they need to be JITed to fully
> ordered operations. POWERPC currently emits relaxed operations for
> these.

Thanks for catching this.

> diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
> b/arch/powerpc/net/bpf_jit_comp32.c
> index 2f39c50ca729..b635e5344e8a 100644
> --- a/arch/powerpc/net/bpf_jit_comp32.c
> +++ b/arch/powerpc/net/bpf_jit_comp32.c
> @@ -853,6 +853,15 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
> u32 *fimage, struct code
>   /* Get offset into TMP_REG */
>   EMIT(PPC_RAW_LI(tmp_reg, off));
>   tmp_idx = ctx->idx * 4;
> + /*
> +  * Enforce full ordering for operations with BPF_FETCH 
> by emitting a 'sync'
> +  * before and after the operation.
> +  *
> +  * This is a requirement in the Linux Kernel Memory 
> Model.
> +  * See __cmpxchg_u64() in asm/cmpxchg.h as an example.
> +  */
> + if (imm & BPF_FETCH)
> + EMIT(PPC_RAW_SYNC());
>   /* load value from memory into r0 */
>   EMIT(PPC_RAW_LWARX(_R0, tmp_reg, dst_reg, 0));
>  
> @@ -905,6 +914,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
> u32 *fimage, struct code
>  
>   /* For the BPF_FETCH variant, get old data into src_reg 
> */
>   if (imm & BPF_FETCH) {
> + /* Emit 'sync' to enforce full ordering */
> + EMIT(PPC_RAW_SYNC());
>   EMIT(PPC_RAW_MR(ret_reg, ax_reg));
>   if (!fp->aux->verifier_zext)
>   EMIT(PPC_RAW_LI(ret_reg - 1, 0)); /* 
> higher 32-bit */

On 32-bit there are non-SMP systems where those syncs will probably be 
expensive.

I think just adding an IS_ENABLED(CONFIG_SMP) around the syncs is
probably sufficient. Christophe?

cheers


Re: [PATCH 0/4] ASoC: Constify static snd_pcm_hardware

2024-05-07 Thread Mark Brown
On Mon, 29 Apr 2024 13:48:45 +0200, Krzysztof Kozlowski wrote:
> No dependencies.
> 
> Static 'struct snd_pcm_hardware' is not modified by few drivers and its
> copy is passed to the core, so it can be made const for increased code
> safety.
> 
> Best regards,
> Krzysztof
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/4] ASoC: qcom: Constify static snd_pcm_hardware
  commit: e6fa3509cb32df373b15212a99f69a6595efd1c3
[2/4] ASoC: fsl: Constify static snd_pcm_hardware
  commit: ed90156037659473ee95eafe3f72d8498e5384ff
[3/4] ASoC: meson: Constify static snd_pcm_hardware
  commit: 7b5ce9f0c52a5885d34d46bba62e9eaedc3dd459
[4/4] ASoC: uniphier: Constify static snd_pcm_hardware
  commit: 74a15fabd271d0fd82ceecbbfa1b98ea0a4709dd

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark



Re: [PATCH v2 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

2024-05-07 Thread kernel test robot
Hi Vignesh,

kernel test robot noticed the following build warnings:

[auto build test WARNING on kees/for-next/execve]
[also build test WARNING on tip/x86/core kees/for-next/pstore 
kees/for-next/kspp linus/master v6.9-rc7 next-20240507]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Vignesh-Balasubramanian/x86-elf-Add-a-new-note-section-containing-Xfeatures-information-to-x86-core-files/20240507-175615
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git 
for-next/execve
patch link:
https://lore.kernel.org/r/20240507095330.2674-2-vigbalas%40amd.com
patch subject: [PATCH v2 1/1] x86/elf: Add a new .note section containing 
Xfeatures information to x86 core files
config: i386-randconfig-013-20240508 
(https://download.01.org/0day-ci/archive/20240508/202405080809.lgyhryu3-...@intel.com/config)
compiler: clang version 18.1.4 (https://github.com/llvm/llvm-project 
e6c3289804a67ea0bb6a86fadbe454dd93b8d855)
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240508/202405080809.lgyhryu3-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202405080809.lgyhryu3-...@intel.com/

All warnings (new ones prefixed by >>):

>> arch/x86/kernel/fpu/xstate.c:91:19: warning: unused variable 'owner_name' 
>> [-Wunused-const-variable]
  91 | static const char owner_name[] = "LINUX";
 |   ^~
   1 warning generated.

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for DRM_I915_DEBUG_GEM
   Depends on [n]: HAS_IOMEM [=y] && DRM_I915 [=m] && EXPERT [=y] && 
DRM_I915_WERROR [=n]
   Selected by [m]:
   - DRM_I915_DEBUG [=y] && HAS_IOMEM [=y] && DRM_I915 [=m] && EXPERT [=y] && 
!COMPILE_TEST [=n]


vim +/owner_name +91 arch/x86/kernel/fpu/xstate.c

90  
  > 91  static const char owner_name[] = "LINUX";
92  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Re: [PATCH RESEND v8 07/16] mm/execmem, arch: convert simple overrides of module_alloc to execmem

2024-05-07 Thread Google
On Sun,  5 May 2024 19:06:19 +0300
Mike Rapoport  wrote:

> From: "Mike Rapoport (IBM)" 
> 
> Several architectures override module_alloc() only to define address
> range for code allocations different than VMALLOC address space.
> 
> Provide a generic implementation in execmem that uses the parameters for
> address space ranges, required alignment and page protections provided
> by architectures.
> 
> The architectures must fill execmem_info structure and implement
> execmem_arch_setup() that returns a pointer to that structure. This way the
> execmem initialization won't be called from every architecture, but rather
> from a central place, namely a core_initcall() in execmem.
> 
> The execmem provides execmem_alloc() API that wraps __vmalloc_node_range()
> with the parameters defined by the architectures.  If an architecture does
> not implement execmem_arch_setup(), execmem_alloc() will fall back to
> module_alloc().
> 

Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) 

Thanks,

> Signed-off-by: Mike Rapoport (IBM) 
> Acked-by: Song Liu 
> ---
>  arch/loongarch/kernel/module.c | 19 --
>  arch/mips/kernel/module.c  | 20 --
>  arch/nios2/kernel/module.c | 21 ---
>  arch/parisc/kernel/module.c| 24 
>  arch/riscv/kernel/module.c | 24 
>  arch/sparc/kernel/module.c | 20 --
>  include/linux/execmem.h| 47 
>  mm/execmem.c   | 67 --
>  mm/mm_init.c   |  2 +
>  9 files changed, 210 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c
> index c7d0338d12c1..ca6dd7ea1610 100644
> --- a/arch/loongarch/kernel/module.c
> +++ b/arch/loongarch/kernel/module.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -490,10 +491,22 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char 
> *strtab,
>   return 0;
>  }
>  
> -void *module_alloc(unsigned long size)
> +static struct execmem_info execmem_info __ro_after_init;
> +
> +struct execmem_info __init *execmem_arch_setup(void)
>  {
> - return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> - GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE, 
> __builtin_return_address(0));
> + execmem_info = (struct execmem_info){
> + .ranges = {
> + [EXECMEM_DEFAULT] = {
> + .start  = MODULES_VADDR,
> + .end= MODULES_END,
> + .pgprot = PAGE_KERNEL,
> + .alignment = 1,
> + },
> + },
> + };
> +
> + return &execmem_info;
>  }
>  
>  static void module_init_ftrace_plt(const Elf_Ehdr *hdr,
> diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
> index 9a6c96014904..59225a3cf918 100644
> --- a/arch/mips/kernel/module.c
> +++ b/arch/mips/kernel/module.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  struct mips_hi16 {
> @@ -32,11 +33,22 @@ static LIST_HEAD(dbe_list);
>  static DEFINE_SPINLOCK(dbe_lock);
>  
>  #ifdef MODULES_VADDR
> -void *module_alloc(unsigned long size)
> +static struct execmem_info execmem_info __ro_after_init;
> +
> +struct execmem_info __init *execmem_arch_setup(void)
>  {
> - return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> - GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
> - __builtin_return_address(0));
> + execmem_info = (struct execmem_info){
> + .ranges = {
> + [EXECMEM_DEFAULT] = {
> + .start  = MODULES_VADDR,
> + .end= MODULES_END,
> + .pgprot = PAGE_KERNEL,
> + .alignment = 1,
> + },
> + },
> + };
> +
> + return &execmem_info;
>  }
>  #endif
>  
> diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
> index 9c97b7513853..0d1ee86631fc 100644
> --- a/arch/nios2/kernel/module.c
> +++ b/arch/nios2/kernel/module.c
> @@ -18,15 +18,26 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> -void *module_alloc(unsigned long size)
> +static struct execmem_info execmem_info __ro_after_init;
> +
> +struct execmem_info __init *execmem_arch_setup(void)
>  {
> - return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> - GFP_KERNEL, PAGE_KERNEL_EXEC,
> - VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
> - __builtin_return_address(0));
> + execmem_info = (struct execmem_info){
> + .ranges = {
> + [EXECMEM_DEFAULT] = {
> + .start  = MODULES_VADDR,

Re: [PATCH v2 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

2024-05-07 Thread kernel test robot
Hi Vignesh,

kernel test robot noticed the following build warnings:

[auto build test WARNING on kees/for-next/execve]
[also build test WARNING on tip/x86/core kees/for-next/pstore 
kees/for-next/kspp linus/master v6.9-rc7 next-20240507]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Vignesh-Balasubramanian/x86-elf-Add-a-new-note-section-containing-Xfeatures-information-to-x86-core-files/20240507-175615
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git 
for-next/execve
patch link:
https://lore.kernel.org/r/20240507095330.2674-2-vigbalas%40amd.com
patch subject: [PATCH v2 1/1] x86/elf: Add a new .note section containing 
Xfeatures information to x86 core files
config: i386-buildonly-randconfig-006-20240508 
(https://download.01.org/0day-ci/archive/20240508/202405080715.hyq1ae9v-...@intel.com/config)
compiler: gcc-8 (Ubuntu 8.4.0-3ubuntu2) 8.4.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240508/202405080715.hyq1ae9v-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202405080715.hyq1ae9v-...@intel.com/

All warnings (new ones prefixed by >>):

>> arch/x86/kernel/fpu/xstate.c:91:19: warning: 'owner_name' defined but not 
>> used [-Wunused-const-variable=]
static const char owner_name[] = "LINUX";
  ^~
   In file included from include/linux/string.h:369,
from arch/x86/include/asm/page_32.h:18,
from arch/x86/include/asm/page.h:14,
from arch/x86/include/asm/processor.h:20,
from arch/x86/include/asm/timex.h:5,
from include/linux/timex.h:67,
from include/linux/time32.h:13,
from include/linux/time.h:60,
from include/linux/compat.h:10,
from arch/x86/kernel/fpu/xstate.c:8:
   In function 'fortify_memcpy_chk',
   inlined from 'membuf_write.isra.6' at include/linux/regset.h:42:3,
   inlined from '__copy_xstate_to_uabi_buf' at 
arch/x86/kernel/fpu/xstate.c:1049:2:
   include/linux/fortify-string.h:562:4: warning: call to 
'__read_overflow2_field' declared with attribute warning: detected read beyond 
size of field (2nd parameter); maybe use struct_group()?
   __read_overflow2_field(q_size_field, size);
   ^~


vim +/owner_name +91 arch/x86/kernel/fpu/xstate.c

90  
  > 91  static const char owner_name[] = "LINUX";
92  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Re: [PATCH RESEND v8 05/16] module: make module_memory_{alloc,free} more self-contained

2024-05-07 Thread Google
On Sun,  5 May 2024 19:06:17 +0300
Mike Rapoport  wrote:

> From: "Mike Rapoport (IBM)" 
> 
> Move the logic related to the memory allocation and freeing into
> module_memory_alloc() and module_memory_free().
> 

Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) 

Thanks,

> Signed-off-by: Mike Rapoport (IBM) 
> Reviewed-by: Philippe Mathieu-Daudé 
> ---
>  kernel/module/main.c | 64 +++-
>  1 file changed, 39 insertions(+), 25 deletions(-)
> 
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index e1e8a7a9d6c1..5b82b069e0d3 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -1203,15 +1203,44 @@ static bool mod_mem_use_vmalloc(enum mod_mem_type 
> type)
>   mod_mem_type_is_core_data(type);
>  }
>  
> -static void *module_memory_alloc(unsigned int size, enum mod_mem_type type)
> +static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
>  {
> + unsigned int size = PAGE_ALIGN(mod->mem[type].size);
> + void *ptr;
> +
> + mod->mem[type].size = size;
> +
>   if (mod_mem_use_vmalloc(type))
> - return vzalloc(size);
> - return module_alloc(size);
> + ptr = vmalloc(size);
> + else
> + ptr = module_alloc(size);
> +
> + if (!ptr)
> + return -ENOMEM;
> +
> + /*
> +  * The pointer to these blocks of memory are stored on the module
> +  * structure and we keep that around so long as the module is
> +  * around. We only free that memory when we unload the module.
> +  * Just mark them as not being a leak then. The .init* ELF
> +  * sections *do* get freed after boot so we *could* treat them
> +  * slightly differently with kmemleak_ignore() and only grey
> +  * them out as they work as typical memory allocations which
> +  * *do* eventually get freed, but let's just keep things simple
> +  * and avoid *any* false positives.
> +  */
> + kmemleak_not_leak(ptr);
> +
> + memset(ptr, 0, size);
> + mod->mem[type].base = ptr;
> +
> + return 0;
>  }
>  
> -static void module_memory_free(void *ptr, enum mod_mem_type type)
> +static void module_memory_free(struct module *mod, enum mod_mem_type type)
>  {
> + void *ptr = mod->mem[type].base;
> +
>   if (mod_mem_use_vmalloc(type))
>   vfree(ptr);
>   else
> @@ -1229,12 +1258,12 @@ static void free_mod_mem(struct module *mod)
>   /* Free lock-classes; relies on the preceding sync_rcu(). */
>   lockdep_free_key_range(mod_mem->base, mod_mem->size);
>   if (mod_mem->size)
> - module_memory_free(mod_mem->base, type);
> + module_memory_free(mod, type);
>   }
>  
>   /* MOD_DATA hosts mod, so free it at last */
>   lockdep_free_key_range(mod->mem[MOD_DATA].base, 
> mod->mem[MOD_DATA].size);
> - module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA);
> + module_memory_free(mod, MOD_DATA);
>  }
>  
>  /* Free a module, remove from lists, etc. */
> @@ -2225,7 +2254,6 @@ static int find_module_sections(struct module *mod, 
> struct load_info *info)
>  static int move_module(struct module *mod, struct load_info *info)
>  {
>   int i;
> - void *ptr;
>   enum mod_mem_type t = 0;
>   int ret = -ENOMEM;
>  
> @@ -2234,26 +2262,12 @@ static int move_module(struct module *mod, struct 
> load_info *info)
>   mod->mem[type].base = NULL;
>   continue;
>   }
> - mod->mem[type].size = PAGE_ALIGN(mod->mem[type].size);
> - ptr = module_memory_alloc(mod->mem[type].size, type);
> - /*
> - * The pointer to these blocks of memory are stored on the 
> module
> - * structure and we keep that around so long as the module is
> - * around. We only free that memory when we unload the 
> module.
> - * Just mark them as not being a leak then. The .init* ELF
> - * sections *do* get freed after boot so we *could* treat 
> them
> - * slightly differently with kmemleak_ignore() and only grey
> - * them out as they work as typical memory allocations which
> - * *do* eventually get freed, but let's just keep things 
> simple
> - * and avoid *any* false positives.
> -  */
> - kmemleak_not_leak(ptr);
> - if (!ptr) {
> +
> + ret = module_memory_alloc(mod, type);
> + if (ret) {
>   t = type;
>   goto out_enomem;
>   }
> - memset(ptr, 0, mod->mem[type].size);
> - mod->mem[type].base = ptr;
>   }
>  
>   /* Transfer each section which specifies SHF_ALLOC */
> @@ -2296,7 +2310,7 @@ static int move_module(struct module *mod, struct 
> load_info *info)
>   return 0;
>  out_enomem:
>   

[PATCH bpf] powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH

2024-05-07 Thread Puranjay Mohan
The Linux Kernel Memory Model [1][2] requires RMW operations that have a
return value to be fully ordered.

BPF atomic operations with BPF_FETCH (including BPF_XCHG and
BPF_CMPXCHG) return a value back so they need to be JITed to fully
ordered operations. POWERPC currently emits relaxed operations for
these.

We can show this by running the following litmus-test:

PPC SB+atomic_add+fetch

{
0:r0=x;  (* dst reg assuming offset is 0 *)
0:r1=2;  (* src reg *)
0:r2=1;
0:r4=y;  (* P0 writes to this, P1 reads this *)
0:r5=z;  (* P1 writes to this, P0 reads this *)
0:r6=0;

1:r2=1;
1:r4=y;
1:r5=z;
}

P0  | P1;
stw r2, 0(r4)   | stw  r2,0(r5) ;
|   ;
loop:lwarx  r3, r6, r0  |   ;
mr  r8, r3  |   ;
add r3, r3, r1  | sync  ;
stwcx.  r3, r6, r0  |   ;
bne loop|   ;
mr  r1, r8  |   ;
|   ;
lwa r7, 0(r5)   | lwa  r7,0(r4) ;

~exists(0:r7=0 /\ 1:r7=0)

Witnesses
Positive: 9 Negative: 3
Condition ~exists (0:r7=0 /\ 1:r7=0)
Observation SB+atomic_add+fetch Sometimes 3 9

This test shows that the older store in P0 is reordered with a newer
load to a different address. Although there is a RMW operation with
fetch between them. Adding a sync before and after RMW fixes the issue:

Witnesses
Positive: 9 Negative: 0
Condition ~exists (0:r7=0 /\ 1:r7=0)
Observation SB+atomic_add+fetch Never 0 9

[1] https://www.kernel.org/doc/Documentation/memory-barriers.txt
[2] https://www.kernel.org/doc/Documentation/atomic_t.txt

Fixes: 65112709115f ("powerpc/bpf/64: add support for BPF_ATOMIC bitwise 
operations")
Signed-off-by: Puranjay Mohan 
---
 arch/powerpc/net/bpf_jit_comp32.c | 11 +++
 arch/powerpc/net/bpf_jit_comp64.c | 11 +++
 2 files changed, 22 insertions(+)

diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index 2f39c50ca729..b635e5344e8a 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -853,6 +853,15 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
u32 *fimage, struct code
/* Get offset into TMP_REG */
EMIT(PPC_RAW_LI(tmp_reg, off));
tmp_idx = ctx->idx * 4;
+   /*
+* Enforce full ordering for operations with BPF_FETCH 
by emitting a 'sync'
+* before and after the operation.
+*
+* This is a requirement in the Linux Kernel Memory 
Model.
+* See __cmpxchg_u64() in asm/cmpxchg.h as an example.
+*/
+   if (imm & BPF_FETCH)
+   EMIT(PPC_RAW_SYNC());
/* load value from memory into r0 */
EMIT(PPC_RAW_LWARX(_R0, tmp_reg, dst_reg, 0));
 
@@ -905,6 +914,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 
*fimage, struct code
 
/* For the BPF_FETCH variant, get old data into src_reg 
*/
if (imm & BPF_FETCH) {
+   /* Emit 'sync' to enforce full ordering */
+   EMIT(PPC_RAW_SYNC());
EMIT(PPC_RAW_MR(ret_reg, ax_reg));
if (!fp->aux->verifier_zext)
EMIT(PPC_RAW_LI(ret_reg - 1, 0)); /* 
higher 32-bit */
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 79f23974a320..27026f19605d 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -804,6 +804,15 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
u32 *fimage, struct code
/* Get offset into TMP_REG_1 */
EMIT(PPC_RAW_LI(tmp1_reg, off));
tmp_idx = ctx->idx * 4;
+   /*
+* Enforce full ordering for operations with BPF_FETCH 
by emitting a 'sync'
+* before and after the operation.
+*
+* This is a requirement in the Linux Kernel Memory 
Model.
+* See __cmpxchg_u64() in asm/cmpxchg.h as an example.
+*/
+   if (imm & BPF_FETCH)
+   EMIT(PPC_RAW_SYNC());
/* load value from memory into TMP_REG_2 */
if (size == BPF_DW)
EMIT(PPC_RAW_LDARX(tmp2_reg, tmp1_reg, dst_reg, 
0));
@@ -865,6 +874,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 
*fimage, struct code
PPC_BCC_SHORT(COND_NE, tmp_idx);
 
if (imm & BPF_FETCH) {
+   

Re: [PATCH 1/1] [RFC] ethernet: Convert from tasklet to BH workqueue

2024-05-07 Thread Russell King (Oracle)
On Tue, May 07, 2024 at 07:01:11PM +, Allen Pais wrote:
> The only generic interface to execute asynchronously in the BH context is
> tasklet; however, it's marked deprecated and has some design flaws. To
> replace tasklets, BH workqueue support was recently added. A BH workqueue
> behaves similarly to regular workqueues except that the queued work items
> are executed in the BH context.
> 
> This patch converts drivers/ethernet/* from tasklet to BH workqueue.

I doubt you're going to get many comments on this patch, being so large
and spread across all drivers. I'm not going to bother trying to edit
this down to something more sensible, I'll just plonk my comment here.

For the mvpp2 driver, you're only updating a comment - and looking at
it, the comment no longer reflects the code. It doesn't make use of
tasklets at all. That makes the comment wrong whether or not it's
updated. So I suggest rather than doing a search and replace for
"tasklet" to "BH blahblah" (sorry, I don't remember what you replaced
it with) just get rid of that bit of the comment.

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!


Re: [PATCH 1/1] [RFC] ethernet: Convert from tasklet to BH workqueue

2024-05-07 Thread Allen
On Tue, May 7, 2024 at 12:23 PM Russell King (Oracle)
 wrote:
>
> On Tue, May 07, 2024 at 07:01:11PM +, Allen Pais wrote:
> > The only generic interface to execute asynchronously in the BH context is
> > tasklet; however, it's marked deprecated and has some design flaws. To
> > replace tasklets, BH workqueue support was recently added. A BH workqueue
> > behaves similarly to regular workqueues except that the queued work items
> > are executed in the BH context.
> >
> > This patch converts drivers/ethernet/* from tasklet to BH workqueue.
>
> I doubt you're going to get many comments on this patch, being so large
> and spread across all drivers. I'm not going to bother trying to edit
> this down to something more sensible, I'll just plonk my comment here.
>
> For the mvpp2 driver, you're only updating a comment - and looking at
> it, the comment no longer reflects the code. It doesn't make use of
> tasklets at all. That makes the comment wrong whether or not it's
> updated. So I suggest rather than doing a search and replace for
> "tasklet" to "BH blahblah" (sorry, I don't remember what you replaced
> it with) just get rid of that bit of the comment.
>

 Thank you Russell.

 I will get rid of the comment. If it helps, I can create a patch for each
driver. We did that in the past, with this series, I thought it would be
easier to apply one patch.

Thanks,

   - Allen


[PATCH 1/1] [RFC] ethernet: Convert from tasklet to BH workqueue

2024-05-07 Thread Allen Pais
The only generic interface to execute asynchronously in the BH context is
tasklet; however, it's marked deprecated and has some design flaws. To
replace tasklets, BH workqueue support was recently added. A BH workqueue
behaves similarly to regular workqueues except that the queued work items
are executed in the BH context.

This patch converts drivers/ethernet/* from tasklet to BH workqueue.

Based on the work done by Tejun Heo 
Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git 
disable_work-v1

Signed-off-by: Allen Pais 
---
 drivers/infiniband/hw/mlx4/cq.c   |  2 +-
 drivers/infiniband/hw/mlx5/cq.c   |  2 +-
 drivers/net/ethernet/alteon/acenic.c  | 26 +++
 drivers/net/ethernet/alteon/acenic.h  |  7 +-
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c  | 30 
 drivers/net/ethernet/amd/xgbe/xgbe-i2c.c  | 16 ++---
 drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 16 ++---
 drivers/net/ethernet/amd/xgbe/xgbe-pci.c  |  4 +-
 drivers/net/ethernet/amd/xgbe/xgbe.h  | 11 +--
 drivers/net/ethernet/broadcom/cnic.c  | 19 ++---
 drivers/net/ethernet/broadcom/cnic.h  |  2 +-
 drivers/net/ethernet/cadence/macb.h   |  3 +-
 drivers/net/ethernet/cadence/macb_main.c  | 10 +--
 .../net/ethernet/cavium/liquidio/lio_core.c   |  4 +-
 .../net/ethernet/cavium/liquidio/lio_main.c   | 25 +++
 .../ethernet/cavium/liquidio/lio_vf_main.c| 10 +--
 .../ethernet/cavium/liquidio/octeon_droq.c|  4 +-
 .../ethernet/cavium/liquidio/octeon_main.h|  5 +-
 .../net/ethernet/cavium/octeon/octeon_mgmt.c  | 12 ++--
 drivers/net/ethernet/cavium/thunder/nic.h |  5 +-
 .../net/ethernet/cavium/thunder/nicvf_main.c  | 24 +++
 .../ethernet/cavium/thunder/nicvf_queues.c|  5 +-
 .../ethernet/cavium/thunder/nicvf_queues.h|  3 +-
 drivers/net/ethernet/chelsio/cxgb/sge.c   | 19 ++---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h|  9 +--
 .../net/ethernet/chelsio/cxgb4/cxgb4_main.c   |  2 +-
 .../ethernet/chelsio/cxgb4/cxgb4_tc_mqprio.c  |  4 +-
 .../net/ethernet/chelsio/cxgb4/cxgb4_uld.c|  2 +-
 drivers/net/ethernet/chelsio/cxgb4/sge.c  | 41 +--
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c|  6 +-
 drivers/net/ethernet/dlink/sundance.c | 41 +--
 .../net/ethernet/huawei/hinic/hinic_hw_cmdq.c |  2 +-
 .../net/ethernet/huawei/hinic/hinic_hw_eqs.c  | 17 +++--
 .../net/ethernet/huawei/hinic/hinic_hw_eqs.h  |  2 +-
 drivers/net/ethernet/ibm/ehea/ehea.h  |  3 +-
 drivers/net/ethernet/ibm/ehea/ehea_main.c | 14 ++--
 drivers/net/ethernet/ibm/ibmvnic.c| 24 +++
 drivers/net/ethernet/ibm/ibmvnic.h|  2 +-
 drivers/net/ethernet/jme.c| 72 +--
 drivers/net/ethernet/jme.h|  9 +--
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |  2 +-
 drivers/net/ethernet/marvell/skge.c   | 12 ++--
 drivers/net/ethernet/marvell/skge.h   |  3 +-
 drivers/net/ethernet/mediatek/mtk_wed_wo.c| 12 ++--
 drivers/net/ethernet/mediatek/mtk_wed_wo.h|  3 +-
 drivers/net/ethernet/mellanox/mlx4/cq.c   | 42 +--
 drivers/net/ethernet/mellanox/mlx4/eq.c   | 10 +--
 drivers/net/ethernet/mellanox/mlx4/mlx4.h | 11 +--
 drivers/net/ethernet/mellanox/mlx5/core/cq.c  | 38 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  | 12 ++--
 .../ethernet/mellanox/mlx5/core/fpga/conn.c   | 15 ++--
 .../ethernet/mellanox/mlx5/core/fpga/conn.h   |  3 +-
 .../net/ethernet/mellanox/mlx5/core/lib/eq.h  | 11 +--
 drivers/net/ethernet/mellanox/mlxsw/pci.c | 29 
 drivers/net/ethernet/micrel/ks8842.c  | 29 
 drivers/net/ethernet/micrel/ksz884x.c | 37 +-
 drivers/net/ethernet/microchip/lan743x_ptp.c  |  2 +-
 drivers/net/ethernet/natsemi/ns83820.c| 10 +--
 drivers/net/ethernet/netronome/nfp/nfd3/dp.c  |  7 +-
 .../net/ethernet/netronome/nfp/nfd3/nfd3.h|  2 +-
 drivers/net/ethernet/netronome/nfp/nfdk/dp.c  |  6 +-
 .../net/ethernet/netronome/nfp/nfdk/nfdk.h|  3 +-
 drivers/net/ethernet/netronome/nfp/nfp_net.h  |  4 +-
 .../ethernet/netronome/nfp/nfp_net_common.c   | 12 ++--
 .../net/ethernet/netronome/nfp/nfp_net_dp.h   |  4 +-
 drivers/net/ethernet/ni/nixge.c   | 19 ++---
 drivers/net/ethernet/qlogic/qed/qed.h |  2 +-
 drivers/net/ethernet/qlogic/qed/qed_int.c |  6 +-
 drivers/net/ethernet/qlogic/qed/qed_int.h |  4 +-
 drivers/net/ethernet/qlogic/qed/qed_main.c| 20 +++---
 drivers/net/ethernet/sfc/falcon/farch.c   |  4 +-
 drivers/net/ethernet/sfc/falcon/net_driver.h  |  2 +-
 drivers/net/ethernet/sfc/falcon/selftest.c|  2 +-
 drivers/net/ethernet/sfc/net_driver.h |  2 +-
 drivers/net/ethernet/sfc/selftest.c   |  2 +-
 drivers/net/ethernet/sfc/siena/farch.c|  4 +-
 drivers/net/ethernet/sfc/siena/net_driver.h   |  2 +-
 drivers/net/ethernet/sfc/siena/selftest.c |  2 +-
 drivers/net/ether

[PATCH 0/1] Convert tasklets to BH workqueues in ethernet drivers

2024-05-07 Thread Allen Pais
This series focuses on converting the existing implementation of
tasklets to bottom half (BH) workqueues across various Ethernet
drivers under drivers/net/ethernet/*.

Impact:
 The conversion is expected to maintain or improve the performance
of the affected drivers. It also improves the maintainability and
readability of the driver code.

Testing:
 - Conducted standard network throughput and latency benchmarks
   to ensure performance parity or improvement.
 - Ran kernel regression tests to verify that changes do not introduce new 
issues.

I appreciate your review and feedback on this patch series.
And additional tested would be really helpful.

Allen Pais (1):
  [RFC] ethernet: Convert from tasklet to BH workqueue

 drivers/infiniband/hw/mlx4/cq.c   |  2 +-
 drivers/infiniband/hw/mlx5/cq.c   |  2 +-
 drivers/net/ethernet/alteon/acenic.c  | 26 +++
 drivers/net/ethernet/alteon/acenic.h  |  7 +-
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c  | 30 
 drivers/net/ethernet/amd/xgbe/xgbe-i2c.c  | 16 ++---
 drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 16 ++---
 drivers/net/ethernet/amd/xgbe/xgbe-pci.c  |  4 +-
 drivers/net/ethernet/amd/xgbe/xgbe.h  | 11 +--
 drivers/net/ethernet/broadcom/cnic.c  | 19 ++---
 drivers/net/ethernet/broadcom/cnic.h  |  2 +-
 drivers/net/ethernet/cadence/macb.h   |  3 +-
 drivers/net/ethernet/cadence/macb_main.c  | 10 +--
 .../net/ethernet/cavium/liquidio/lio_core.c   |  4 +-
 .../net/ethernet/cavium/liquidio/lio_main.c   | 25 +++
 .../ethernet/cavium/liquidio/lio_vf_main.c| 10 +--
 .../ethernet/cavium/liquidio/octeon_droq.c|  4 +-
 .../ethernet/cavium/liquidio/octeon_main.h|  5 +-
 .../net/ethernet/cavium/octeon/octeon_mgmt.c  | 12 ++--
 drivers/net/ethernet/cavium/thunder/nic.h |  5 +-
 .../net/ethernet/cavium/thunder/nicvf_main.c  | 24 +++
 .../ethernet/cavium/thunder/nicvf_queues.c|  5 +-
 .../ethernet/cavium/thunder/nicvf_queues.h|  3 +-
 drivers/net/ethernet/chelsio/cxgb/sge.c   | 19 ++---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h|  9 +--
 .../net/ethernet/chelsio/cxgb4/cxgb4_main.c   |  2 +-
 .../ethernet/chelsio/cxgb4/cxgb4_tc_mqprio.c  |  4 +-
 .../net/ethernet/chelsio/cxgb4/cxgb4_uld.c|  2 +-
 drivers/net/ethernet/chelsio/cxgb4/sge.c  | 41 +--
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c|  6 +-
 drivers/net/ethernet/dlink/sundance.c | 41 +--
 .../net/ethernet/huawei/hinic/hinic_hw_cmdq.c |  2 +-
 .../net/ethernet/huawei/hinic/hinic_hw_eqs.c  | 17 +++--
 .../net/ethernet/huawei/hinic/hinic_hw_eqs.h  |  2 +-
 drivers/net/ethernet/ibm/ehea/ehea.h  |  3 +-
 drivers/net/ethernet/ibm/ehea/ehea_main.c | 14 ++--
 drivers/net/ethernet/ibm/ibmvnic.c| 24 +++
 drivers/net/ethernet/ibm/ibmvnic.h|  2 +-
 drivers/net/ethernet/jme.c| 72 +--
 drivers/net/ethernet/jme.h|  9 +--
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |  2 +-
 drivers/net/ethernet/marvell/skge.c   | 12 ++--
 drivers/net/ethernet/marvell/skge.h   |  3 +-
 drivers/net/ethernet/mediatek/mtk_wed_wo.c| 12 ++--
 drivers/net/ethernet/mediatek/mtk_wed_wo.h|  3 +-
 drivers/net/ethernet/mellanox/mlx4/cq.c   | 42 +--
 drivers/net/ethernet/mellanox/mlx4/eq.c   | 10 +--
 drivers/net/ethernet/mellanox/mlx4/mlx4.h | 11 +--
 drivers/net/ethernet/mellanox/mlx5/core/cq.c  | 38 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  | 12 ++--
 .../ethernet/mellanox/mlx5/core/fpga/conn.c   | 15 ++--
 .../ethernet/mellanox/mlx5/core/fpga/conn.h   |  3 +-
 .../net/ethernet/mellanox/mlx5/core/lib/eq.h  | 11 +--
 drivers/net/ethernet/mellanox/mlxsw/pci.c | 29 
 drivers/net/ethernet/micrel/ks8842.c  | 29 
 drivers/net/ethernet/micrel/ksz884x.c | 37 +-
 drivers/net/ethernet/microchip/lan743x_ptp.c  |  2 +-
 drivers/net/ethernet/natsemi/ns83820.c| 10 +--
 drivers/net/ethernet/netronome/nfp/nfd3/dp.c  |  7 +-
 .../net/ethernet/netronome/nfp/nfd3/nfd3.h|  2 +-
 drivers/net/ethernet/netronome/nfp/nfdk/dp.c  |  6 +-
 .../net/ethernet/netronome/nfp/nfdk/nfdk.h|  3 +-
 drivers/net/ethernet/netronome/nfp/nfp_net.h  |  4 +-
 .../ethernet/netronome/nfp/nfp_net_common.c   | 12 ++--
 .../net/ethernet/netronome/nfp/nfp_net_dp.h   |  4 +-
 drivers/net/ethernet/ni/nixge.c   | 19 ++---
 drivers/net/ethernet/qlogic/qed/qed.h |  2 +-
 drivers/net/ethernet/qlogic/qed/qed_int.c |  6 +-
 drivers/net/ethernet/qlogic/qed/qed_int.h |  4 +-
 drivers/net/ethernet/qlogic/qed/qed_main.c| 20 +++---
 drivers/net/ethernet/sfc/falcon/farch.c   |  4 +-
 drivers/net/ethernet/sfc/falcon/net_driver.h  |  2 +-
 drivers/net/ethernet/sfc/falcon/selftest.c|  2 +-
 drivers/net/ethernet/sfc/net_driver.h |  2 +-
 drivers/net/ethernet/sfc/selftest.c   |  2 +-
 dr

Re: [PATCH v3] kprobe/ftrace: bail out if ftrace was killed

2024-05-07 Thread Stephen Brennan
Christophe Leroy  writes:
> Le 01/05/2024 à 18:29, Stephen Brennan a écrit :
>> If an error happens in ftrace, ftrace_kill() will prevent disarming
>> kprobes. Eventually, the ftrace_ops associated with the kprobes will be
>> freed, yet the kprobes will still be active, and when triggered, they
>> will use the freed memory, likely resulting in a page fault and panic.
>> 
>> This behavior can be reproduced quite easily, by creating a kprobe and
>> then triggering a ftrace_kill(). For simplicity, we can simulate an
>> ftrace error with a kernel module like [1]:
>> 
>> [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer
>> 
>>sudo perf probe --add commit_creds
>>sudo perf trace -e probe:commit_creds
>># In another terminal
>>make
>>sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
>># Back to perf terminal
>># ctrl-c
>>sudo perf probe --del commit_creds
>> 
>> After a short period, a page fault and panic would occur as the kprobe
>> continues to execute and uses the freed ftrace_ops. While ftrace_kill()
>> is supposed to be used only in extreme circumstances, it is invoked in
>> FTRACE_WARN_ON() and so there are many places where an unexpected bug
>> could be triggered, yet the system may continue operating, possibly
>> without the administrator noticing. If ftrace_kill() does not panic the
>> system, then we should do everything we can to continue operating,
>> rather than leave a ticking time bomb.
>> 
>> Signed-off-by: Stephen Brennan 
>> ---
>> Changes in v3:
>>Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled"
>>variable and check it directly in the kprobe handlers.
>
> Isn't it safer to provide a fonction rather than a direct access to a 
> variable ?

Is the concern that other code could modify this variable? If so, then I
suppose the function call is safer. But the variable is not exported and
I think built-in code can be trusted not to muck with it. Maybe I'm
missing your point about safety though?

> By the way, wouldn't it be more performant to use a static branch (jump 
> label) ?

I agree with Steven's concern that text modification would unfortunately
not be a good way to handle an error in text modification. Especially, I
believe there could be deadlock risks, as static key enablement requires
taking the text_mutex and the jump_label_mutex. I'd be concerned that
the text_mutex could already be held in some situations where
ftrace_kill() is called. But I'm not certain about that.

Thanks for taking a look!
Stephen


Re: [RFC PATCH v2 0/6] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO

2024-05-07 Thread Shivaprasad G Bhat

Hi Jason,


On 5/6/24 23:13, Jason Gunthorpe wrote:

On Sat, May 04, 2024 at 12:33:53AM +0530, Shivaprasad G Bhat wrote:

We have legacy workloads using VFIO in userspace/kvm guests running
on downstream distro kernels. We want these workloads to be able to
continue running on our arch.

It has been broken since 2018, I don't find this reasoning entirely
reasonable :\


Though upstream has been broken since 2018 for pSeries, the breaking

patches got trickled into downstream distro kernels only in the last few

years. The legacy workloads that were running on PowerNV with these

downstream distros are now broken on the pSeries logical partitions

without the fixes in this series.


I firmly believe the refactoring in this patch series is a step in
that direction.

But fine, as long as we are going to fix it. PPC really needs this to
be resolved to keep working.


Thanks, We are working on it.


Regards,

Shivaprasad



Jason


Re: [kvm-unit-tests PATCH v9 07/31] scripts: allow machine option to be specified in unittests.cfg

2024-05-07 Thread Thomas Huth

On 04/05/2024 14.28, Nicholas Piggin wrote:

This allows different machines with different requirements to be
supported by run_tests.sh, similarly to how different accelerators
are handled.

Acked-by: Thomas Huth 
Acked-by: Andrew Jones 
Signed-off-by: Nicholas Piggin 
---
  docs/unittests.txt   |  7 +++
  scripts/common.bash  |  8 ++--
  scripts/runtime.bash | 16 
  3 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/docs/unittests.txt b/docs/unittests.txt
index 7cf2c55ad..6449efd78 100644
--- a/docs/unittests.txt
+++ b/docs/unittests.txt
@@ -42,6 +42,13 @@ For / directories that support multiple architectures, 
this restricts
  the test to the specified arch. By default, the test will run on any
  architecture.
  
+machine

+---
+For those architectures that support multiple machine types, this restricts
+the test to the specified machine. By default, the test will run on
+any machine type. (Note, the machine can be specified with the MACHINE=
+environment variable, and defaults to the architecture's default.)
+
  smp
  ---
  smp = 
diff --git a/scripts/common.bash b/scripts/common.bash
index 5e9ad53e2..3aa557c8c 100644
--- a/scripts/common.bash
+++ b/scripts/common.bash
@@ -10,6 +10,7 @@ function for_each_unittest()
local opts
local groups
local arch
+   local machine
local check
local accel
local timeout
@@ -21,7 +22,7 @@ function for_each_unittest()
if [[ "$line" =~ ^\[(.*)\]$ ]]; then
rematch=${BASH_REMATCH[1]}
if [ -n "${testname}" ]; then
-   $(arch_cmd) "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" 
"$check" "$accel" "$timeout"
+   $(arch_cmd) "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" 
"$machine" "$check" "$accel" "$timeout"
fi
testname=$rematch
smp=1
@@ -29,6 +30,7 @@ function for_each_unittest()
opts=""
groups=""
arch=""
+   machine=""
check=""
accel=""
timeout=""
@@ -58,6 +60,8 @@ function for_each_unittest()
groups=${BASH_REMATCH[1]}
elif [[ $line =~ ^arch\ *=\ *(.*)$ ]]; then
arch=${BASH_REMATCH[1]}
+   elif [[ $line =~ ^machine\ *=\ *(.*)$ ]]; then
+   machine=${BASH_REMATCH[1]}
elif [[ $line =~ ^check\ *=\ *(.*)$ ]]; then
check=${BASH_REMATCH[1]}
elif [[ $line =~ ^accel\ *=\ *(.*)$ ]]; then
@@ -67,7 +71,7 @@ function for_each_unittest()
fi
done
if [ -n "${testname}" ]; then
-   $(arch_cmd) "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" 
"$accel" "$timeout"
+   $(arch_cmd) "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$machine" 
"$check" "$accel" "$timeout"
fi
exec {fd}<&-
  }
diff --git a/scripts/runtime.bash b/scripts/runtime.bash
index 177b62166..0c96d6ea2 100644
--- a/scripts/runtime.bash
+++ b/scripts/runtime.bash
@@ -32,7 +32,7 @@ premature_failure()
  get_cmdline()
  {
  local kernel=$1
-echo "TESTNAME=$testname TIMEOUT=$timeout ACCEL=$accel $RUNTIME_arch_run 
$kernel -smp $smp $opts"
+echo "TESTNAME=$testname TIMEOUT=$timeout MACHINE=$machine ACCEL=$accel 
$RUNTIME_arch_run $kernel -smp $smp $opts"
  }
  
  skip_nodefault()

@@ -80,9 +80,10 @@ function run()
  local kernel="$4"
  local opts="$5"
  local arch="$6"
-local check="${CHECK:-$7}"
-local accel="$8"
-local timeout="${9:-$TIMEOUT}" # unittests.cfg overrides the default
+local machine="$7"
+local check="${CHECK:-$8}"
+local accel="$9"
+local timeout="${10:-$TIMEOUT}" # unittests.cfg overrides the default
  
  if [ "${CONFIG_EFI}" == "y" ]; then

  kernel=${kernel/%.flat/.efi}
@@ -116,6 +117,13 @@ function run()
  return 2
  fi
  
+if [ -n "$machine" ] && [ -n "$MACHINE" ] && [ "$machine" != "$MACHINE" ]; then

+print_result "SKIP" $testname "" "$machine only"
+return 2
+elif [ -n "$MACHINE" ]; then
+machine="$MACHINE"
+fi
+
  if [ -n "$accel" ] && [ -n "$ACCEL" ] && [ "$accel" != "$ACCEL" ]; then
  print_result "SKIP" $testname "" "$accel only, but ACCEL=$ACCEL"
  return 2


For some reasons that I don't quite understand yet, this patch causes the 
"sieve" test to always timeout on the s390x runner, see e.g.:


 https://gitlab.com/thuth/kvm-unit-tests/-/jobs/6798954987

Everything is fine in the previous patches (I pushed now the previous 5 
patches to the repo):


 https://gitlab.com/kvm-unit-tests/kvm-unit-tests/-/pipelines/1281919104

Could it be that he TIME

Re: [PATCH V2 3/9] tools/perf: Fix a comment about multi_regs in extract_reg_offset function

2024-05-07 Thread Arnaldo Carvalho de Melo
On Mon, May 06, 2024 at 09:40:15PM -0700, Namhyung Kim wrote:
> On Mon, May 6, 2024 at 5:19 AM Athira Rajeev
>  wrote:
> >
> > Fix a comment in function which explains how multi_regs field gets set
> > for an instruction. In the example, "mov  %rsi, 8(%rbx,%rcx,4)", the
> > comment mistakenly referred to "dst_multi_regs = 0". Correct it to use
> > "src_multi_regs = 0"
> >
> > Signed-off-by: Athira Rajeev 
> 
> Acked-by: Namhyung Kim 

Cherry picked this one into perf-tools-next.

Thanks,

- Arnaldo


Re: [PATCH v18 0/6] powerpc/crash: Kernel handling of CPU and memory hotplug

2024-05-07 Thread Michael Ellerman
On Tue, 26 Mar 2024 11:24:07 +0530, Sourabh Jain wrote:
> Commit 247262756121 ("crash: add generic infrastructure for crash
> hotplug support") added a generic infrastructure that allows
> architectures to selectively update the kdump image component during CPU
> or memory add/remove events within the kernel itself.
> 
> This patch series adds crash hotplug handler for PowerPC and enable
> support to update the kdump image on CPU/Memory add/remove events.
> 
> [...]

Applied to powerpc/topic/kdump-hotplug.

[1/6] crash: forward memory_notify arg to arch crash hotplug handler
  https://git.kernel.org/powerpc/c/118005713e35a1893c6ee47ab2926cca277737de
[2/6] crash: add a new kexec flag for hotplug support
  https://git.kernel.org/powerpc/c/79365026f86948b52c3cb7bf099dded92c559b4c
[3/6] powerpc/kexec: move *_memory_ranges functions to ranges.c
  https://git.kernel.org/powerpc/c/f5f0da5a7b18fab383bac92044fd8f4f288c9d38
[4/6] PowerPC/kexec: make the update_cpus_node() function public
  https://git.kernel.org/powerpc/c/0857beff9c1ec8bb421a8b7a721da0f34cc886c0
[5/6] powerpc/crash: add crash CPU hotplug support
  https://git.kernel.org/powerpc/c/b741092d59761b98781fcb4f3f521312ed8d5006
[6/6] powerpc/crash: add crash memory hotplug support
  https://git.kernel.org/powerpc/c/849599b702ef8977fcd5b2f27c61ef773c42bb88

cheers


Re: [PATCH] powerpc/crash: remove unnecessary NULL check before kvfree()

2024-05-07 Thread Michael Ellerman
On Thu, 02 May 2024 23:50:40 +0530, Sourabh Jain wrote:
> Fix the following coccicheck build warning:
> 
> arch/powerpc/kexec/crash.c:488:2-8: WARNING: NULL check before some
> freeing functions is not needed.
> 
> 

Applied to powerpc/topic/kdump-hotplug.

[1/1] powerpc/crash: remove unnecessary NULL check before kvfree()
  https://git.kernel.org/powerpc/c/9803af291162dbca4b9773586a3f5c392f0dd974

cheers


Re: [PATCH 00/13] ASoC: Use snd_soc_substream_to_rtd() for accessing private_data

2024-05-07 Thread Mark Brown
On Tue, 30 Apr 2024 16:02:09 +0200, Krzysztof Kozlowski wrote:
> Do not open-code snd_soc_substream_to_rtd() when accessing
> snd_pcm_substream->private_data.  This makes code more consistent with
> rest of ASoC and allows in the future to move the field to any other
> place or add additional checks in snd_soc_substream_to_rtd().
> 
> Best regards,
> Krzysztof
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[01/13] ASoC: qcom: Use snd_soc_substream_to_rtd() for accessing private_data
commit: 77678a25d1ecf70dc1d7ea2c0ab7609af15b83d3
[02/13] ASoC: tegra: Use snd_soc_substream_to_rtd() for accessing private_data
commit: 3beb985abbf29e660edd1708f8a120ae9bbbddc3
[03/13] ASoC: ti: Use snd_soc_substream_to_rtd() for accessing private_data
commit: 72a666f47f958a57db16b6bdd9ed385674069693
[04/13] ASoC: arm: Use snd_soc_substream_to_rtd() for accessing private_data
commit: a80f2f8443a4ae10c568566f57fe704ea52c5bdb
[05/13] ASoC: amd: Use snd_soc_substream_to_rtd() for accessing private_data
commit: a84d84077512fc64cf1fc2292a3638690a026737
[06/13] ASoC: fsl: Use snd_soc_substream_to_rtd() for accessing private_data
commit: b695d8be5bba9897ee670ec102ca608ecaf625c4
[07/13] ASoC: img: Use snd_soc_substream_to_rtd() for accessing private_data
commit: 3b62178720594e08bdf8a87515ccca0328fe41fe
[08/13] ASoC: kirkwood: Use snd_soc_substream_to_rtd() for accessing 
private_data
commit: fe42c3b75b93dee9a4010e2297f1783e48684af7
[09/13] ASoC: loongson: Use snd_soc_substream_to_rtd() for accessing 
private_data
commit: ffad75cebb865fef6f8e40f921c08c79a8faf7e3
[10/13] ASoC: mediatek: Use snd_soc_substream_to_rtd() for accessing 
private_data
commit: 410a45140fb76709cf2bbad84bc8a731acf632c8
[11/13] ASoC: meson: Use snd_soc_substream_to_rtd() for accessing private_data
commit: 22f5680a9cbc7388f97e5386c15c325d6961b958
[12/13] ASoC: samsung: Use snd_soc_substream_to_rtd() for accessing private_data
commit: 3e726593107d134221f666b4f2be612b278c3ddb
[13/13] ASoC: sunxi: Use snd_soc_substream_to_rtd() for accessing private_data
commit: 47aa51677c975a5f66bc93d1c527e8878cf34d6c

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark



[PATCH] macintosh/ams: Fix unused variable warning

2024-05-07 Thread Michael Ellerman
If both CONFIG_SENSORS_AMS_PMU and CONFIG_SENSORS_AMS_I2C are unset,
there is an unused variable warning in the ams driver:

  drivers/macintosh/ams/ams-core.c: In function 'ams_init':
  drivers/macintosh/ams/ams-core.c:181:29: warning: unused variable 'np'
181 | struct device_node *np;

The driver needs at least one of the configs enabled in order to
actually function. So fix the compiler warning by ensuring at least one
of the configs is enabled.

Suggested-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 
---
 drivers/macintosh/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/macintosh/Kconfig b/drivers/macintosh/Kconfig
index a0e717a986dc..fb38f68f 100644
--- a/drivers/macintosh/Kconfig
+++ b/drivers/macintosh/Kconfig
@@ -262,7 +262,7 @@ config SENSORS_AMS
  will be called ams.
 
 config SENSORS_AMS_PMU
-   bool "PMU variant"
+   bool "PMU variant" if SENSORS_AMS_I2C
depends on SENSORS_AMS && ADB_PMU
default y
help
-- 
2.45.0



Re: [kvm-unit-tests PATCH v9 17/31] powerpc: Add cpu_relax

2024-05-07 Thread Thomas Huth

On 04/05/2024 14.28, Nicholas Piggin wrote:

Add a cpu_relax variant that uses SMT priority nop instructions like
Linux. This was split out of the SMP patch because it affects the sprs
test case.

Signed-off-by: Nicholas Piggin 
---
  lib/ppc64/asm/barrier.h | 1 +
  powerpc/sprs.c  | 4 ++--
  2 files changed, 3 insertions(+), 2 deletions(-)



Reviewed-by: Thomas Huth 



Re: [PATCH 1/2] radix/kfence: map __kfence_pool at page granularity

2024-05-07 Thread Christophe Leroy


Le 24/04/2024 à 13:09, Hari Bathini a écrit :
> When KFENCE is enabled, total system memory is mapped at page level
> granularity. But in radix MMU mode, ~3GB additional memory is needed
> to map 100GB of system memory at page level granularity when compared
> to using 2MB direct mapping. This is not desired considering KFENCE is
> designed to be enabled in production kernels [1]. Also, mapping memory
> allocated for KFENCE pool at page granularity seems sufficient enough
> to enable KFENCE support. So, allocate __kfence_pool during bootup and
> map it at page granularity instead of mapping all system memory at
> page granularity.


That seems to be more or less copied from ARM64 ? Is that the best 
approach ?

Can't you implement arch_kfence_init_pool() instead ?

Also, it seems your patch only addresses PPC64. The same should be done 
for PPC32 and there are probably parts that should be common.

> 
> Without patch:
>  # cat /proc/meminfo
>  MemTotal:   101201920 kB
> 
> With patch:
>  # cat /proc/meminfo
>  MemTotal:   104483904 kB
> 
> All kfence_test.c testcases passed with this patch.
> 
> [1] https://lore.kernel.org/all/20201103175841.3495947-2-el...@google.com/
> 
> Signed-off-by: Hari Bathini 
> ---
>   arch/powerpc/include/asm/kfence.h|  5 
>   arch/powerpc/mm/book3s64/radix_pgtable.c | 34 ++--
>   arch/powerpc/mm/init_64.c| 14 ++
>   3 files changed, 45 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kfence.h 
> b/arch/powerpc/include/asm/kfence.h
> index 424ceef82ae6..18ec2b06ba1e 100644
> --- a/arch/powerpc/include/asm/kfence.h
> +++ b/arch/powerpc/include/asm/kfence.h
> @@ -8,6 +8,7 @@
>   #ifndef __ASM_POWERPC_KFENCE_H
>   #define __ASM_POWERPC_KFENCE_H
>   
> +#include 

Why do you need that ? It can't be needed by the extern bool you are 
adding below.

If it is needed by some C file that includes asm/kfence.h, it should 
include linux/kfence.h by itself, see for instance 
mm/kfence/kfence_test.c and mm/kfence/core.c

>   #include 
>   #include 
>   
> @@ -15,6 +16,10 @@
>   #define ARCH_FUNC_PREFIX "."
>   #endif
>   
> +#ifdef CONFIG_KFENCE
> +extern bool kfence_early_init;
> +#endif
> +
>   static inline bool arch_kfence_init_pool(void)
>   {
>   return true;
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
> b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index 15e88f1439ec..fccbf92f279b 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -31,6 +31,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
>   
>   #include 
>   
> @@ -291,9 +292,8 @@ static unsigned long next_boundary(unsigned long addr, 
> unsigned long end)
>   return end;
>   }
>   
> -static int __meminit create_physical_mapping(unsigned long start,
> -  unsigned long end,
> -  int nid, pgprot_t _prot)
> +static int __meminit create_physical_mapping(unsigned long start, unsigned 
> long end, int nid,
> +  pgprot_t _prot, unsigned long 
> mapping_sz_limit)
>   {
>   unsigned long vaddr, addr, mapping_size = 0;
>   bool prev_exec, exec = false;
> @@ -301,7 +301,10 @@ static int __meminit create_physical_mapping(unsigned 
> long start,
>   int psize;
>   unsigned long max_mapping_size = memory_block_size;
>   
> - if (debug_pagealloc_enabled_or_kfence())
> + if (mapping_sz_limit < max_mapping_size)
> + max_mapping_size = mapping_sz_limit;
> +
> + if (debug_pagealloc_enabled())
>   max_mapping_size = PAGE_SIZE;
>   
>   start = ALIGN(start, PAGE_SIZE);
> @@ -358,6 +361,7 @@ static int __meminit create_physical_mapping(unsigned 
> long start,
>   
>   static void __init radix_init_pgtable(void)
>   {
> + phys_addr_t kfence_pool __maybe_unused;

Don't do that. Avoid using __maybe_unused.

Instead, declare this var where it is used.

>   unsigned long rts_field;
>   phys_addr_t start, end;
>   u64 i;
> @@ -365,6 +369,13 @@ static void __init radix_init_pgtable(void)
>   /* We don't support slb for radix */
>   slb_set_size(0);
>   
> +#ifdef CONFIG_KFENCE
> + if (kfence_early_init) {

Declare kfence_pool here.

> + kfence_pool = memblock_phys_alloc(KFENCE_POOL_SIZE, PAGE_SIZE);
> + memblock_mark_nomap(kfence_pool, KFENCE_POOL_SIZE);
> + }
> +#endif
> +
>   /*
>* Create the linear mapping
>*/
> @@ -380,10 +391,18 @@ static void __init radix_init_pgtable(void)
>   continue;
>   }
>   
> - WARN_ON(create_physical_mapping(start, end,
> - -1, PAGE_KERNEL));
> + WARN_ON(create_physical_mapping(start, end, -1, PAGE_KERNEL, 
> ~0UL));
>   }
>   
> +#ifdef CONFIG_KFENCE
> + if (kfence_early_init) {
> 

[powerpc:topic/kdump-hotplug] BUILD SUCCESS 9803af291162dbca4b9773586a3f5c392f0dd974

2024-05-07 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
topic/kdump-hotplug
branch HEAD: 9803af291162dbca4b9773586a3f5c392f0dd974  powerpc/crash: remove 
unnecessary NULL check before kvfree()

elapsed time: 1453m

configs tested: 180
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alpha allnoconfig   gcc  
alphaallyesconfig   gcc  
alpha   defconfig   gcc  
arc  allmodconfig   gcc  
arc   allnoconfig   gcc  
arc  allyesconfig   gcc  
arc defconfig   gcc  
arc nsimosci_hs_defconfig   gcc  
arc   randconfig-001-20240507   gcc  
arc   randconfig-002-20240507   gcc  
arm  allmodconfig   gcc  
arm   allnoconfig   clang
arm  allyesconfig   gcc  
arm am200epdkit_defconfig   gcc  
arm defconfig   clang
armdove_defconfig   gcc  
arm   randconfig-001-20240507   gcc  
arm   randconfig-002-20240507   clang
arm   randconfig-003-20240507   gcc  
arm   randconfig-004-20240507   clang
arm  sp7021_defconfig   gcc  
armvt8500_v6_v7_defconfig   gcc  
arm64allmodconfig   clang
arm64 allnoconfig   gcc  
arm64   defconfig   gcc  
arm64 randconfig-001-20240507   clang
arm64 randconfig-002-20240507   clang
arm64 randconfig-003-20240507   clang
arm64 randconfig-004-20240507   clang
csky allmodconfig   gcc  
csky  allnoconfig   gcc  
csky allyesconfig   gcc  
cskydefconfig   gcc  
csky  randconfig-001-20240507   gcc  
csky  randconfig-002-20240507   gcc  
hexagon  allmodconfig   clang
hexagon   allnoconfig   clang
hexagon  allyesconfig   clang
hexagon defconfig   clang
hexagon   randconfig-001-20240507   clang
hexagon   randconfig-002-20240507   clang
i386 allmodconfig   gcc  
i386  allnoconfig   gcc  
i386 allyesconfig   gcc  
i386 buildonly-randconfig-001-20240506   gcc  
i386 buildonly-randconfig-002-20240506   clang
i386 buildonly-randconfig-003-20240506   gcc  
i386 buildonly-randconfig-004-20240506   gcc  
i386 buildonly-randconfig-005-20240506   gcc  
i386 buildonly-randconfig-006-20240506   clang
i386defconfig   clang
i386  randconfig-001-20240506   gcc  
i386  randconfig-002-20240506   clang
i386  randconfig-003-20240506   gcc  
i386  randconfig-004-20240506   clang
i386  randconfig-005-20240506   clang
i386  randconfig-006-20240506   gcc  
i386  randconfig-011-20240506   gcc  
i386  randconfig-012-20240506   gcc  
i386  randconfig-013-20240506   gcc  
i386  randconfig-014-20240506   clang
i386  randconfig-015-20240506   clang
i386  randconfig-016-20240506   clang
loongarchallmodconfig   gcc  
loongarch allnoconfig   gcc  
loongarch   defconfig   gcc  
loongarch randconfig-001-20240507   gcc  
loongarch randconfig-002-20240507   gcc  
m68k allmodconfig   gcc  
m68k  allnoconfig   gcc  
m68k allyesconfig   gcc  
m68kdefconfig   gcc  
m68km5407c3_defconfig   gcc  
microblaze   allmodconfig   gcc  
microblazeallnoconfig   gcc  
microblaze   allyesconfig   gcc  
microblaze  defconfig   gcc  
mips  allnoconfig   gcc  
mips allyesconfig   gcc  
mips  ath25_defconfig   clang
mipsmaltaup_xpa_defconfig   gcc  
mipsqi_lb60_defconfig   clang
nios2allmodconfig   gcc  
nios2 allnoconfig   gcc  
nios2allyesconfig   gcc  
nios2

Re: [kvm-unit-tests PATCH v9 12/31] powerpc: general interrupt tests

2024-05-07 Thread Thomas Huth

On 04/05/2024 14.28, Nicholas Piggin wrote:

Add basic testing of various kinds of interrupts, machine check,
page fault, illegal, decrementer, trace, syscall, etc.

This has a known failure on QEMU TCG pseries machines where MSR[ME]
can be incorrectly set to 0.

Signed-off-by: Nicholas Piggin 
---
  lib/powerpc/asm/processor.h |   4 +
  lib/powerpc/asm/reg.h   |  17 ++
  lib/powerpc/setup.c |  11 +
  lib/ppc64/asm/ptrace.h  |  16 ++
  powerpc/Makefile.common |   3 +-
  powerpc/interrupts.c| 414 
  powerpc/unittests.cfg   |   3 +
  7 files changed, 467 insertions(+), 1 deletion(-)
  create mode 100644 powerpc/interrupts.c


Acked-by: Thomas Huth 



Re: [kvm-unit-tests PATCH v9 03/31] powerpc: Mark known failing tests as kfail

2024-05-07 Thread Thomas Huth

On 07/05/2024 06.07, Nicholas Piggin wrote:

On Mon May 6, 2024 at 5:37 PM AEST, Thomas Huth wrote:

On 04/05/2024 14.28, Nicholas Piggin wrote:

Mark the failing h_cede_tm and spapr_vpa tests as kfail.

Signed-off-by: Nicholas Piggin 
---
   powerpc/spapr_vpa.c | 3 ++-
   powerpc/tm.c| 3 ++-
   2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/powerpc/spapr_vpa.c b/powerpc/spapr_vpa.c
index c2075e157..46fa0485c 100644
--- a/powerpc/spapr_vpa.c
+++ b/powerpc/spapr_vpa.c
@@ -150,7 +150,8 @@ static void test_vpa(void)
report_fail("Could not deregister after registration");
   
   	disp_count1 = be32_to_cpu(vpa->vp_dispatch_count);

-   report(disp_count1 % 2 == 1, "Dispatch count is odd after deregister");
+   /* TCG known fail, could be wrong test, must verify against PowerVM */
+   report_kfail(true, disp_count1 % 2 == 1, "Dispatch count is odd after 
deregister");


Using "true" as first argument looks rather pointless - then you could also
simply delete the test completely if it can never be tested reliably.

Thus could you please introduce a helper function is_tcg() that could be
used to check whether we run under TCG (and not KVM)? I think you could
check for "linux,kvm" in the "compatible" property in /hypervisor in the
device tree to see whether we're running in KVM mode or in TCG mode.


This I added in patch 30.

The reason for the suboptimal patch ordering was just me being lazy and
avoiding rebasing annoyance. I'd written a bunch of failing test cases
for QEMU work, but hadn't done the kvm/tcg test yet. It had a few
conflicts so I put it at the end... can rebase if you'd really prefer.


Ah, ok, no need to rebase then, as long it's there in the end, it's fine.

 Thanks,
  Thomas



[PATCH v7] arch/powerpc/kvm: Add support for reading VPA counters for pseries guests

2024-05-07 Thread Gautam Menghani
PAPR hypervisor has introduced three new counters in the VPA area of
LPAR CPUs for KVM L2 guest (see [1] for terminology) observability - 2
for context switches from host to guest and vice versa, and 1 counter
for getting the total time spent inside the KVM guest. Add a tracepoint
that enables reading the counters for use by ftrace/perf. Note that this
tracepoint is only available for nestedv2 API (i.e, KVM on PowerVM).

[1] Terminology:
a. L1 refers to the VM (LPAR) booted on top of PAPR hypervisor
b. L2 refers to the KVM guest booted on top of L1.

Acked-by: Naveen N Rao 
Signed-off-by: Vaibhav Jain 
Signed-off-by: Gautam Menghani 
---
v6 -> v7:
1. Use TRACE_EVENT_FN_COND to handle zero counters case.
2. Use for_each_present_cpu() to handle hotplugs.

v5 -> v6:
1. Use TRACE_EVENT_FN to enable/disable counters only once.
2. Remove the agg. counters from vcpu->arch.
3. Use PACA to maintain old counter values instead of zeroing on every
entry.
4. Simplify variable names

v4 -> v5:
1. Define helper functions for getting/setting the accumulation counter
in L2's VPA

v3 -> v4:
1. After vcpu_run, check the VPA flag instead of checking for tracepoint
being enabled for disabling the cs time accumulation.

v2 -> v3:
1. Move the counter disabling and zeroing code to a different function.
2. Move the get_lppaca() inside the tracepoint_enabled() branch.
3. Add the aggregation logic to maintain total context switch time.

v1 -> v2:
1. Fix the build error due to invalid struct member reference.

 arch/powerpc/include/asm/lppaca.h | 11 +--
 arch/powerpc/include/asm/paca.h   |  5 +++
 arch/powerpc/kvm/book3s_hv.c  | 52 +++
 arch/powerpc/kvm/trace_hv.h   | 29 +
 4 files changed, 94 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h 
b/arch/powerpc/include/asm/lppaca.h
index 61ec2447dabf..f40a646bee3c 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -62,7 +62,8 @@ struct lppaca {
u8  donate_dedicated_cpu;   /* Donate dedicated CPU cycles */
u8  fpregs_in_use;
u8  pmcregs_in_use;
-   u8  reserved8[28];
+   u8  l2_counters_enable;  /* Enable usage of counters for KVM guest 
*/
+   u8  reserved8[27];
__be64  wait_state_cycles;  /* Wait cycles for this proc */
u8  reserved9[28];
__be16  slb_count;  /* # of SLBs to maintain */
@@ -92,9 +93,13 @@ struct lppaca {
/* cacheline 4-5 */
 
__be32  page_ins;   /* CMO Hint - # page ins by OS */
-   u8  reserved12[148];
+   u8  reserved12[28];
+   volatile __be64 l1_to_l2_cs_tb;
+   volatile __be64 l2_to_l1_cs_tb;
+   volatile __be64 l2_runtime_tb;
+   u8 reserved13[96];
volatile __be64 dtl_idx;/* Dispatch Trace Log head index */
-   u8  reserved13[96];
+   u8  reserved14[96];
 } cacheline_aligned;
 
 #define lppaca_of(cpu) (*paca_ptrs[cpu]->lppaca_ptr)
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 1d58da946739..f20ac7a6efa4 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -278,6 +278,11 @@ struct paca_struct {
struct mce_info *mce_info;
u8 mce_pending_irq_work;
 #endif /* CONFIG_PPC_BOOK3S_64 */
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+   u64 l1_to_l2_cs;
+   u64 l2_to_l1_cs;
+   u64 l2_runtime_agg;
+#endif
 } cacheline_aligned;
 
 extern void copy_mm_to_paca(struct mm_struct *mm);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8e86eb577eb8..6acfe59c9b4f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4108,6 +4108,54 @@ static void vcpu_vpa_increment_dispatch(struct kvm_vcpu 
*vcpu)
}
 }
 
+static inline int kvmhv_get_l2_counters_status(void)
+{
+   return get_lppaca()->l2_counters_enable;
+}
+
+static inline void kvmhv_set_l2_counters_status(int cpu, bool status)
+{
+   if (status)
+   lppaca_of(cpu).l2_counters_enable = 1;
+   else
+   lppaca_of(cpu).l2_counters_enable = 0;
+}
+
+int kmvhv_counters_tracepoint_regfunc(void)
+{
+   int cpu;
+
+   for_each_present_cpu(cpu) {
+   kvmhv_set_l2_counters_status(cpu, true);
+   }
+   return 0;
+}
+
+void kmvhv_counters_tracepoint_unregfunc(void)
+{
+   int cpu;
+
+   for_each_present_cpu(cpu) {
+   kvmhv_set_l2_counters_status(cpu, false);
+   }
+}
+
+static void do_trace_nested_cs_time(struct kvm_vcpu *vcpu)
+{
+   struct lppaca *lp = get_lppaca();
+   u64 l1_to_l2_ns, l2_to_l1_ns, l2_runtime_ns;
+
+   l1_to_l2_ns = tb_to_ns(be64_to_cpu(lp->l1_to_l2_cs_tb));
+   l2_to_l1_ns = tb_to_ns(be64_to_cpu(lp->l2_to_l1_cs_tb));
+   l2_runtime_ns = tb_to_ns(be64_to_cpu(lp->l2_runtime_tb));
+   trace_kvmppc_vcpu_stats(vcpu, l1_to_l2_ns - local_paca->

Re: [PATCH v4 2/2] powerpc/bpf: enable kfunc call

2024-05-07 Thread Naveen N Rao
On Thu, May 02, 2024 at 11:02:05PM GMT, Hari Bathini wrote:
> Currently, bpf jit code on powerpc assumes all the bpf functions and
> helpers to be part of core kernel text. This is false for kfunc case,
> as function addresses may not be part of core kernel text area. So,
> add support for addresses that are not within core kernel text area
> too, to enable kfunc support. Emit instructions based on whether the
> function address is within core kernel text address or not, to retain
> optimized instruction sequence where possible.
> 
> In case of PCREL, as a bpf function that is not within core kernel
> text area is likely to go out of range with relative addressing on
> kernel base, use PC relative addressing. If that goes out of range,
> load the full address with PPC_LI64().
> 
> With addresses that are not within core kernel text area supported,
> override bpf_jit_supports_kfunc_call() to enable kfunc support. Also,
> override bpf_jit_supports_far_kfunc_call() to enable 64-bit pointers,
> as an address offset can be more than 32-bit long on PPC64.
> 
> Signed-off-by: Hari Bathini 
> ---
> 
> * Changes in v4:
>   - Use either kernelbase or PC for relative addressing. Also, fallback
> to PPC_LI64(), if both are out of range.
>   - Update r2 with kernel TOC for elfv1 too as elfv1 also uses the
> optimization sequence, that expects r2 to be kernel TOC, when
> function address is within core kernel text.
> 
> * Changes in v3:
>   - Retained optimized instruction sequence when function address is
> a core kernel address as suggested by Naveen.
>   - Used unoptimized instruction sequence for PCREL addressing to
> avoid out of range errors for core kernel function addresses.
>   - Folded patch that adds support for kfunc calls with patch that
> enables/advertises this support as suggested by Naveen.
> 
> 
>  arch/powerpc/net/bpf_jit_comp.c   | 10 +
>  arch/powerpc/net/bpf_jit_comp64.c | 61 ++-
>  2 files changed, 61 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index 0f9a21783329..984655419da5 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -359,3 +359,13 @@ void bpf_jit_free(struct bpf_prog *fp)
>  
>   bpf_prog_unlock_free(fp);
>  }
> +
> +bool bpf_jit_supports_kfunc_call(void)
> +{
> + return true;
> +}
> +
> +bool bpf_jit_supports_far_kfunc_call(void)
> +{
> + return IS_ENABLED(CONFIG_PPC64);
> +}
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> b/arch/powerpc/net/bpf_jit_comp64.c
> index 4de08e35e284..8afc14a4a125 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -208,17 +208,13 @@ bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, 
> struct codegen_context *ctx,
>   unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0;
>   long reladdr;
>  
> - if (WARN_ON_ONCE(!core_kernel_text(func_addr)))
> + if (WARN_ON_ONCE(!kernel_text_address(func_addr)))
>   return -EINVAL;
>  
> - if (IS_ENABLED(CONFIG_PPC_KERNEL_PCREL)) {
> - reladdr = func_addr - local_paca->kernelbase;
> +#ifdef CONFIG_PPC_KERNEL_PCREL

Would be good to retain use of IS_ENABLED().
Reviewed-by: Naveen N Rao 


- Naveen



Re: [PATCH v4 1/2] powerpc64/bpf: fix tail calls for PCREL addressing

2024-05-07 Thread Naveen N Rao
On Thu, May 02, 2024 at 11:02:04PM GMT, Hari Bathini wrote:
> With PCREL addressing, there is no kernel TOC. So, it is not setup in
> prologue when PCREL addressing is used. But the number of instructions
> to skip on a tail call was not adjusted accordingly. That resulted in
> not so obvious failures while using tailcalls. 'tailcalls' selftest
> crashed the system with the below call trace:
> 
>   bpf_test_run+0xe8/0x3cc (unreliable)
>   bpf_prog_test_run_skb+0x348/0x778
>   __sys_bpf+0xb04/0x2b00
>   sys_bpf+0x28/0x38
>   system_call_exception+0x168/0x340
>   system_call_vectored_common+0x15c/0x2ec
> 
> Also, as bpf programs are always module addresses and a bpf helper in
> general is a core kernel text address, using PC relative addressing
> often fails with "out of range of pcrel address" error. Switch to
> using kernel base for relative addressing to handle this better.
> 
> Fixes: 7e3a68be42e1 ("powerpc/64: vmlinux support building with PCREL 
> addresing")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Hari Bathini 
> ---
> 
> * Changes in v4:
>   - Fix out of range errors by switching to kernelbase instead of PC
> for relative addressing.
> 
> * Changes in v3:
>   - New patch to fix tailcall issues with PCREL addressing.
> 
> 
>  arch/powerpc/net/bpf_jit_comp64.c | 30 --
>  1 file changed, 16 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> b/arch/powerpc/net/bpf_jit_comp64.c
> index 79f23974a320..4de08e35e284 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -202,7 +202,8 @@ void bpf_jit_build_epilogue(u32 *image, struct 
> codegen_context *ctx)
>   EMIT(PPC_RAW_BLR());
>  }
>  
> -static int bpf_jit_emit_func_call_hlp(u32 *image, struct codegen_context 
> *ctx, u64 func)
> +static int
> +bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, struct codegen_context 
> *ctx, u64 func)
>  {
>   unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0;
>   long reladdr;
> @@ -211,19 +212,20 @@ static int bpf_jit_emit_func_call_hlp(u32 *image, 
> struct codegen_context *ctx, u
>   return -EINVAL;
>  
>   if (IS_ENABLED(CONFIG_PPC_KERNEL_PCREL)) {
> - reladdr = func_addr - CTX_NIA(ctx);
> + reladdr = func_addr - local_paca->kernelbase;
>  
>   if (reladdr >= (long)SZ_8G || reladdr < -(long)SZ_8G) {
> - pr_err("eBPF: address of %ps out of range of pcrel 
> address.\n",
> - (void *)func);
> + pr_err("eBPF: address of %ps out of range of 34-bit 
> relative address.\n",
> +(void *)func);
>   return -ERANGE;
>   }
> - /* pla r12,addr */
> - EMIT(PPC_PREFIX_MLS | __PPC_PRFX_R(1) | IMM_H18(reladdr));
> - EMIT(PPC_INST_PADDI | ___PPC_RT(_R12) | IMM_L(reladdr));
> - EMIT(PPC_RAW_MTCTR(_R12));
> - EMIT(PPC_RAW_BCTR());
> -
> + EMIT(PPC_RAW_LD(_R12, _R13, offsetof(struct paca_struct, 
> kernelbase)));
> + /* Align for subsequent prefix instruction */
> + if (!IS_ALIGNED((unsigned long)fimage + CTX_NIA(ctx), 8))
> + EMIT(PPC_RAW_NOP());

We don't need the prefix instruction to be aligned to a doubleword 
boundary - it just shouldn't cross a 64-byte boundary. Since we know the 
exact address of the instruction here, we should be able to check for 
that case.

> + /* paddi r12,r12,addr */
> + EMIT(PPC_PREFIX_MLS | __PPC_PRFX_R(0) | IMM_H18(reladdr));
> + EMIT(PPC_INST_PADDI | ___PPC_RT(_R12) | ___PPC_RA(_R12) | 
> IMM_L(reladdr));
>   } else {
>   reladdr = func_addr - kernel_toc_addr();
>   if (reladdr > 0x7FFF || reladdr < -(0x8000L)) {
> @@ -233,9 +235,9 @@ static int bpf_jit_emit_func_call_hlp(u32 *image, struct 
> codegen_context *ctx, u
>  
>   EMIT(PPC_RAW_ADDIS(_R12, _R2, PPC_HA(reladdr)));
>   EMIT(PPC_RAW_ADDI(_R12, _R12, PPC_LO(reladdr)));
> - EMIT(PPC_RAW_MTCTR(_R12));
> - EMIT(PPC_RAW_BCTRL());
>   }
> + EMIT(PPC_RAW_MTCTR(_R12));
> + EMIT(PPC_RAW_BCTRL());

This change shouldn't be necessary since these instructions are moved 
back into the conditional in the next patch.

Other than those minor comments:
Reviewed-by: Naveen N Rao 


- Naveen



Re: [PATCH 6/7] powerpc: Replace CONFIG_4xx with CONFIG_44x

2024-05-07 Thread Christophe Leroy


Le 06/05/2024 à 14:51, Michael Ellerman a écrit :
> Replace 4xx usage with 44x, and replace 4xx_SOC with 44x.

You can go one step further because CONFIG_44x implies CONFIG_BOOKE, see 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/d404d1257cd1d36aa6b25237f54eb2c2223ce447.1607519517.git.christophe.le...@csgroup.eu/

> 
> Signed-off-by: Michael Ellerman 
> ---
>   arch/powerpc/Kconfig   | 5 +
>   arch/powerpc/include/asm/cacheflush.h  | 2 +-
>   arch/powerpc/include/asm/ppc_asm.h | 2 +-
>   arch/powerpc/kernel/entry_32.S | 6 +++---
>   arch/powerpc/kernel/process.c  | 2 +-
>   arch/powerpc/mm/fault.c| 4 ++--
>   arch/powerpc/mm/ptdump/Makefile| 2 +-
>   arch/powerpc/platforms/4xx/Makefile| 2 +-
>   arch/powerpc/platforms/Kconfig.cputype | 8 +---
>   arch/powerpc/sysdev/Kconfig| 4 ++--
>   10 files changed, 14 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 9a7d2b218516..2b6fa87464a5 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -488,7 +488,7 @@ source "kernel/Kconfig.hz"
>   
>   config MATH_EMULATION
>   bool "Math emulation"
> - depends on 4xx || PPC_8xx || PPC_MPC832x || BOOKE || PPC_MICROWATT
> + depends on 44x || PPC_8xx || PPC_MPC832x || BOOKE || PPC_MICROWATT
>   select PPC_FPU_REGS
>   help
> Some PowerPC chips designed for embedded applications do not have
> @@ -1102,9 +1102,6 @@ config PPC4xx_CPM
> It also enables support for two different idle states (idle-wait
> and idle-doze).
>   
> -config 4xx_SOC
> - bool
> -
>   config FSL_LBC
>   bool "Freescale Local Bus support"
>   help
> diff --git a/arch/powerpc/include/asm/cacheflush.h 
> b/arch/powerpc/include/asm/cacheflush.h
> index ef7d2de33b89..f2656774aaa9 100644
> --- a/arch/powerpc/include/asm/cacheflush.h
> +++ b/arch/powerpc/include/asm/cacheflush.h
> @@ -121,7 +121,7 @@ static inline void invalidate_dcache_range(unsigned long 
> start,
>   mb();   /* sync */
>   }
>   
> -#ifdef CONFIG_4xx
> +#ifdef CONFIG_44x
>   static inline void flush_instruction_cache(void)
>   {
>   iccci((void *)KERNELBASE);
> diff --git a/arch/powerpc/include/asm/ppc_asm.h 
> b/arch/powerpc/include/asm/ppc_asm.h
> index 1d1018c1e482..02897f4b0dbf 100644
> --- a/arch/powerpc/include/asm/ppc_asm.h
> +++ b/arch/powerpc/include/asm/ppc_asm.h
> @@ -482,7 +482,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_CELL_TB_BUG, 
> CPU_FTR_CELL_TB_BUG, 96)
>* and they must be used.
>*/
>   
> -#if !defined(CONFIG_4xx) && !defined(CONFIG_PPC_8xx)
> +#if !defined(CONFIG_44x) && !defined(CONFIG_PPC_8xx)
>   #define tlbia   \
>   li  r4,1024;\
>   mtctr   r4; \
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index 1522164b10e4..98ad926c056f 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -211,7 +211,7 @@ start_kernel_thread:
>   
>   .globl  fast_exception_return
>   fast_exception_return:
> -#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
> +#if !(defined(CONFIG_44x) || defined(CONFIG_BOOKE))
>   andi.   r10,r9,MSR_RI   /* check for recoverable interrupt */
>   beq 3f  /* if not, we've got problems */
>   #endif
> @@ -365,7 +365,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
>   rfi
>   _ASM_NOKPROBE_SYMBOL(interrupt_return)
>   
> -#if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
> +#if defined(CONFIG_44x) || defined(CONFIG_BOOKE)
>   
>   /*
>* Returning from a critical interrupt in user mode doesn't need
> @@ -469,4 +469,4 @@ ret_from_mcheck_exc:
>   RET_FROM_EXC_LEVEL(SPRN_MCSRR0, SPRN_MCSRR1, PPC_RFMCI)
>   _ASM_NOKPROBE_SYMBOL(ret_from_mcheck_exc)
>   #endif /* CONFIG_BOOKE */
> -#endif /* !(CONFIG_4xx || CONFIG_BOOKE) */
> +#endif /* !(CONFIG_44x || CONFIG_BOOKE) */
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 196cfa41ad6e..cddb4c099bbd 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1566,7 +1566,7 @@ static void __show_regs(struct pt_regs *regs)
>   if (trap == INTERRUPT_MACHINE_CHECK ||
>   trap == INTERRUPT_DATA_STORAGE ||
>   trap == INTERRUPT_ALIGNMENT) {
> - if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE))
> + if (IS_ENABLED(CONFIG_44x) || IS_ENABLED(CONFIG_BOOKE))
>   pr_cont("DEAR: "REG" ESR: "REG" ", regs->dear, 
> regs->esr);
>   else
>   pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, 
> regs->dsisr);
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 53335ae21a40..9af44ddf4b53 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -361,13 +361,13 @@ static void sanity_che

Re: [PATCH V2 8/9] tools/perf: Add support to find global register variables using find_data_type_global_reg

2024-05-07 Thread Christophe Leroy


Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
> There are cases where define a global register variable and associate it
> with a specified register. Example, in powerpc, two registers are
> defined to represent variable:
> 1. r13: represents local_paca
> register struct paca_struct *local_paca asm("r13");
> 
> 2. r1: represents stack_pointer
> register void *__stack_pointer asm("r1");

What about r2:

register struct task_struct *current asm ("r2");

> 
> These regs are present in dwarf debug as DW_OP_reg as part of variables
> in the cu_die (compile unit). These are not present in die search done
> in the list of nested scopes since these are global register variables.
> 


Re: [PATCH V2 7/9] tools/perf: Update instruction tracking with add instruction

2024-05-07 Thread Christophe Leroy


Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
> Update instruction tracking with add instruction. Apart from "mr"
> instruction, the register state is carried on by other insns, ie,
> "add, addi, addis". Since these are not memory instructions and doesn't
> fall in the range of (32 to 63), add these as part of nmemonic table.
> For now, add* instructions are added. There is possibility of getting
> more added here. But to extract regs, still the binary code will be
> used. So associate this with "load_store_ops" itself and no other
> changes required.

Looks fragile.

How do you handle addc, adde, addic ?
And also addme, addze, which only have two operands instead of 3 ?

> 
> Signed-off-by: Athira Rajeev 
> ---
>   .../perf/arch/powerpc/annotate/instructions.c | 21 +++
>   tools/perf/util/disasm.c  |  1 +
>   2 files changed, 22 insertions(+)
> 
> diff --git a/tools/perf/arch/powerpc/annotate/instructions.c 
> b/tools/perf/arch/powerpc/annotate/instructions.c
> index cce7023951fe..1f35d8a65bb4 100644
> --- a/tools/perf/arch/powerpc/annotate/instructions.c
> +++ b/tools/perf/arch/powerpc/annotate/instructions.c
> @@ -1,6 +1,17 @@
>   // SPDX-License-Identifier: GPL-2.0
>   #include 
>   
> +/*
> + * powerpc instruction nmemonic table to associate load/store instructions 
> with
> + * move_ops. mov_ops is used to identify add/mr to do instruction tracking.
> + */
> +static struct ins powerpc__instructions[] = {
> + { .name = "mr", .ops = &load_store_ops,  },
> + { .name = "addi",   .ops = &load_store_ops,   },
> + { .name = "addis",  .ops = &load_store_ops,  },
> + { .name = "add",.ops = &load_store_ops,  },
> +};
> +
>   static struct ins_ops *powerpc__associate_instruction_ops(struct arch 
> *arch, const char *name)
>   {
>   int i;
> @@ -75,6 +86,9 @@ static void update_insn_state_powerpc(struct type_state 
> *state,
>   if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
>   return;
>   
> + if (!strncmp(dl->ins.name, "add", 3))
> + goto regs_check;
> +
>   if (strncmp(dl->ins.name, "mr", 2))
>   return;
>   
> @@ -85,6 +99,7 @@ static void update_insn_state_powerpc(struct type_state 
> *state,
>   dst->reg1 = src_reg;
>   }
>   
> +regs_check:
>   if (!has_reg_type(state, dst->reg1))
>   return;
>   
> @@ -115,6 +130,12 @@ static void update_insn_state_powerpc(struct type_state 
> *state __maybe_unused, s
>   static int powerpc__annotate_init(struct arch *arch, char *cpuid 
> __maybe_unused)
>   {
>   if (!arch->initialized) {
> + arch->nr_instructions = ARRAY_SIZE(powerpc__instructions);
> + arch->instructions = calloc(arch->nr_instructions, 
> sizeof(struct ins));
> + if (!arch->instructions)
> + return -ENOMEM;
> + memcpy(arch->instructions, powerpc__instructions, sizeof(struct 
> ins) * arch->nr_instructions);
> + arch->nr_instructions_allocated = arch->nr_instructions;
>   arch->initialized = true;
>   arch->associate_instruction_ops = 
> powerpc__associate_instruction_ops;
>   arch->objdump.comment_char  = '#';
> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index ac6b8b8da38a..32cf506a9010 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -36,6 +36,7 @@ static struct ins_ops mov_ops;
>   static struct ins_ops nop_ops;
>   static struct ins_ops lock_ops;
>   static struct ins_ops ret_ops;
> +static struct ins_ops load_store_ops;
>   
>   static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
>  struct ins_operands *ops, int max_ins_name);


[PATCH v2 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

2024-05-07 Thread Vignesh Balasubramanian
Add a new .note section containing type, size, offset and flags of
every xfeature that is present.

This information will be used by the debuggers to understand the XSAVE
layout of the machine where the core file is dumped, and to read XSAVE
registers, especially during cross-platform debugging.

Some background:

The XSAVE layouts of modern AMD and Intel CPUs differ, especially since
Memory Protection Keys and the AVX-512 features have been inculcated into
the AMD CPUs.
This is since AMD never adopted (and hence never left room in the XSAVE
layout for) the Intel MPX feature. Tools like GDB had assumed a fixed XSAVE
layout matching that of Intel (based on the XCR0 mask).
Hence, the core dumps from AMD CPUs didn't match the known size for the
XCR0 mask. This resulted in GDB and other tools not being able to access
the values of the AVX-512 and PKRU registers on AMD CPUs.
To solve this, an interim solution has been accepted into GDB, and is
already a part of GDB 14, thanks to these series of patches
[ https://sourceware.org/pipermail/gdb-patches/2023-March/198081.html ].
But this patch series depends on heuristics based on the total XSAVE
register set size and the XCR0 mask to infer the layouts of the various
register blocks for core dumps, and hence, is not a foolproof mechanism to
determine the layout of the XSAVE area.

Hence this new core dump note has been proposed as a more sturdy mechanism
to allow GDB/LLDB and other relevant tools to determine the layout of the
XSAVE area of the machine where the corefile was dumped.
The new core dump note (which is being proposed as a per-process .note
section), NT_X86_XSAVE_LAYOUT (0x205) contains an array of structures.
Each structure describes an individual extended feature containing offset,
size and flags (that is obtained through CPUID instruction) in a format
roughly matching the follow C structure:

struct xfeat_component {
   u32 xfeat_type;
   u32 xfeat_sz;
   u32 xfeat_off;
   u32 xfeat_flags;
};

Co-developed-by: Jini Susan George 
Signed-off-by: Jini Susan George 
Signed-off-by: Vignesh Balasubramanian 
---
v1->v2: Removed kernel internal defn dependency, code improvements

 arch/x86/Kconfig |   1 +
 arch/x86/include/asm/elf.h   |  34 +
 arch/x86/kernel/fpu/xstate.c | 141 +++
 fs/binfmt_elf.c  |   4 +-
 include/uapi/linux/elf.h |   1 +
 5 files changed, 179 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 928820e61cb5..cc67daab3396 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -105,6 +105,7 @@ config X86
select ARCH_HAS_DEBUG_WX
select ARCH_HAS_ZONE_DMA_SET if EXPERT
select ARCH_HAVE_NMI_SAFE_CMPXCHG
+   select ARCH_HAVE_EXTRA_ELF_NOTES
select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
select ARCH_MIGHT_HAVE_PC_PARPORT
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 1fb83d47711f..5952574db64b 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -13,6 +13,40 @@
 #include 
 #include 
 
+struct xfeat_component {
+   u32 xfeat_type;
+   u32 xfeat_sz;
+   u32 xfeat_off;
+   u32 xfeat_flags;
+} __packed;
+
+_Static_assert(sizeof(struct xfeat_component)%4 == 0, "xfeat_component is not 
aligned");
+
+enum custom_feature {
+   FEATURE_XSAVE_FP = 0,
+   FEATURE_XSAVE_SSE = 1,
+   FEATURE_XSAVE_YMM = 2,
+   FEATURE_XSAVE_BNDREGS = 3,
+   FEATURE_XSAVE_BNDCSR = 4,
+   FEATURE_XSAVE_OPMASK = 5,
+   FEATURE_XSAVE_ZMM_Hi256 = 6,
+   FEATURE_XSAVE_Hi16_ZMM = 7,
+   FEATURE_XSAVE_PT = 8,
+   FEATURE_XSAVE_PKRU = 9,
+   FEATURE_XSAVE_PASID = 10,
+   FEATURE_XSAVE_CET_USER = 11,
+   FEATURE_XSAVE_CET_SHADOW_STACK = 12,
+   FEATURE_XSAVE_HDC = 13,
+   FEATURE_XSAVE_UINTR = 14,
+   FEATURE_XSAVE_LBR = 15,
+   FEATURE_XSAVE_HWP = 16,
+   FEATURE_XSAVE_XTILE_CFG = 17,
+   FEATURE_XSAVE_XTILE_DATA = 18,
+   FEATURE_MAX,
+   FEATURE_XSAVE_EXTENDED_START = FEATURE_XSAVE_YMM,
+   FEATURE_XSAVE_EXTENDED_END = FEATURE_XSAVE_XTILE_DATA,
+};
+
 typedef unsigned long elf_greg_t;
 
 #define ELF_NGREG (sizeof(struct user_regs_struct) / sizeof(elf_greg_t))
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 33a214b1a4ce..3d1c3c96e34d 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -87,6 +88,8 @@ static unsigned int xstate_flags[XFEATURE_MAX] 
__ro_after_init;
 #define XSTATE_FLAG_SUPERVISOR BIT(0)
 #define XSTATE_FLAG_ALIGNED64  BIT(1)
 
+static const char owner_name[] = "LINUX";
+
 /*
  * Return whether the system supports a given xfeature.
  *
@@ -1837,3 +1840,141 @@ int proc_pid_arch_status(struct seq_file *m, struct 
pid_namespace *ns,
return 0;
 }
 #endif /* CONFIG

[PATCH v2 0/1] Add XSAVE layout description to Core files for debuggers to support varying XSAVE layouts

2024-05-07 Thread Vignesh Balasubramanian
This patch proposes to add an extra .note section in the corefile to dump the 
CPUID information of a machine. This is being done to solve the issue of tools 
like the debuggers having to deal with coredumps from machines with varying 
XSAVE layouts in spite of having the same XCR0 bits. The new proposed .note 
section, at this point, consists of an array of records containing the 
information of each extended feature that is present. This provides details 
about the offsets and the sizes of the various extended save state components 
of the machine where the application crash occurred. Requesting a review for 
this patch.

Please NOTE that this patch has to be applied on top of the patch 
(https://lore.kernel.org/lkml/874jbt7qz3@oldenburg3.str.redhat.com/T/).


Vignesh Balasubramanian (1):
  x86/elf: Add a new .note section containing Xfeatures information to
x86 core files

 arch/x86/Kconfig |   1 +
 arch/x86/include/asm/elf.h   |  34 +
 arch/x86/kernel/fpu/xstate.c | 141 +++
 fs/binfmt_elf.c  |   4 +-
 include/uapi/linux/elf.h |   1 +
 5 files changed, 179 insertions(+), 2 deletions(-)

-- 
2.34.1



Re: [PATCH V2 6/9] tools/perf: Update instruction tracking for powerpc

2024-05-07 Thread Christophe Leroy


Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
> Add instruction tracking function "update_insn_state_powerpc" for
> powerpc. Example sequence in powerpc:
> 
> ld  r10,264(r3)
> mr  r31,r3
> <
> ld  r9,312(r31)

Your approach looks fragile.

mr is a simplified instruction which in fact is  "or r31, r3, r3"

By the way, the compiler sometimes does it different, like below:

lwz r10,264(r3)
addir31, r3, 312
lwz r9, 0(r31)

And what about sequences with lwzu ?

> 
> Consider ithe sample is pointing to: "ld r9,312(r31)".
> Here the memory reference is hit at "312(r31)" where 312 is the offset
> and r31 is the source register. Previous instruction sequence shows that
> register state of r3 is moved to r31. So to identify the data type for r31
> access, the previous instruction ("mr") needs to be tracked and the
> state type entry has to be updated. Current instruction tracking support
> in perf tools infrastructure is specific to x86. Patch adds this for
> powerpc and adds "mr" instruction to be tracked.
> 
> Signed-off-by: Athira Rajeev 


Re: [PATCH V2 5/9] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc

2024-05-07 Thread Christophe Leroy


Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
> Use the raw instruction code and macros to identify memory instructions,
> extract register fields and also offset. The implementation addresses
> the D-form, X-form, DS-form instructions. Two main functions are added.
> New parse function "load_store__parse" as instruction ops parser for
> memory instructions. Unlink other parser (like mov__parse), this parser
> fills in only the "raw" field for source/target and new added "mem_ref"
> field. This is because, here there is no need to parse the disassembled
> code and arch specific macros will take care of extracting offset and
> regs which is easier and will be precise.
> 
> In powerpc, all instructions with a primary opcode from 32 to 63
> are memory instructions. Update "ins__find" function to have "opcode"
> also as a parameter. Don't use the "extract_reg_offset", instead use
> newly added function "get_arch_regs" which will set these fields: reg1,
> reg2, offset depending of where it is source or target ops.

Yes all instructions with a primary opcode from 32 to 63 are memory 
instructions, but not all memory instructions have opcode 32 to 63.

> 
> Signed-off-by: Athira Rajeev 
> ---
>   tools/perf/arch/powerpc/util/dwarf-regs.c | 33 +
>   tools/perf/util/annotate.c| 22 -
>   tools/perf/util/disasm.c  | 59 +--
>   tools/perf/util/disasm.h  |  4 +-
>   tools/perf/util/include/dwarf-regs.h  |  4 +-
>   5 files changed, 114 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c 
> b/tools/perf/arch/powerpc/util/dwarf-regs.c
> index e60a71fd846e..3121c70dc0d3 100644
> --- a/tools/perf/arch/powerpc/util/dwarf-regs.c
> +++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
> @@ -102,6 +102,9 @@ int regs_query_register_offset(const char *name)
>   #define PPC_OP(op)  (((op) >> 26) & 0x3F)
>   #define PPC_RA(a)   (((a) >> 16) & 0x1f)
>   #define PPC_RT(t)   (((t) >> 21) & 0x1f)
> +#define PPC_RB(b)(((b) >> 11) & 0x1f)
> +#define PPC_D(D) ((D) & 0xfffe)
> +#define PPC_DS(DS)   ((DS) & 0xfffc)
>   
>   int get_opcode_insn(unsigned int raw_insn)
>   {
> @@ -117,3 +120,33 @@ int get_target_reg(unsigned int raw_insn)
>   {
>   return PPC_RT(raw_insn);
>   }
> +
> +int get_offset_opcode(int raw_insn __maybe_unused)
> +{
> + int opcode = PPC_OP(raw_insn);
> +
> + /* DS- form */
> + if ((opcode == 58) || (opcode == 62))

Can you re-use macros from arch/powerpc/include/asm/ppc-opcode.h ?

#define OP_LD   58
#define OP_STD  62


> + return PPC_DS(raw_insn);
> + else
> + return PPC_D(raw_insn);
> +}
> +
> +/*
> + * Fills the required fields for op_loc depending on if it
> + * is a source of target.
> + * D form: ins RT,D(RA) -> src_reg1 = RA, offset = D, dst_reg1 = RT
> + * DS form: ins RT,DS(RA) -> src_reg1 = RA, offset = DS, dst_reg1 = RT
> + * X form: ins RT,RA,RB -> src_reg1 = RA, src_reg2 = RB, dst_reg1 = RT
> + */
> +void get_arch_regs(int raw_insn __maybe_unused, int is_source 
> __maybe_unused, struct annotated_op_loc *op_loc __maybe_unused)
> +{
> + if (is_source)
> + op_loc->reg1 = get_source_reg(raw_insn);
> + else
> + op_loc->reg1 = get_target_reg(raw_insn);
> + if (op_loc->multi_regs)
> + op_loc->reg2 = PPC_RB(raw_insn);
> + if (op_loc->mem_ref)
> + op_loc->offset = get_offset_opcode(raw_insn);
> +}

> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index 85692f73e78f..f41a0fadeab4 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -760,11 +800,20 @@ static void ins__sort(struct arch *arch)
>   qsort(arch->instructions, nmemb, sizeof(struct ins), ins__cmp);
>   }
>   
> -static struct ins_ops *__ins__find(struct arch *arch, const char *name)
> +static struct ins_ops *__ins__find(struct arch *arch, const char *name, int 
> opcode)
>   {
>   struct ins *ins;
>   const int nmemb = arch->nr_instructions;
>   
> + if (arch__is(arch, "powerpc")) {
> + /*
> +  * Instructions with a primary opcode from 32 to 63
> +  * are memory instructions in powerpc.
> +  */
> + if ((opcode >= 31) && (opcode <= 63))

Could just be if ((opcode & 0x20)) I guess.

By the way your test is wrong because opcode 31 is not only memory 
instructions, see example below (not exhaustive):

#define OP_31_XOP_TRAP  4   ==> No
#define OP_31_XOP_LDX   21  ==> Yes
#define OP_31_XOP_LWZX  23  ==> Yes
#define OP_31_XOP_LDUX  53  ==> Yes
#define OP_31_XOP_DCBST 54  ==> No
#define OP_31_XOP_LWZUX 55  ==> Yes
#define OP_31_XOP_TRAP_64   68  ==> No
#define OP_31_XOP_DCBF  86  ==> No
#define OP_31_XOP_LBZX  87  ==> Yes
#define OP_31_XOP_STDX  149 ==> Yes
#define O

Re: [PATCH V2 4/9] tools/perf: Add support to capture and parse raw instruction in objdump

2024-05-07 Thread Christophe Leroy


Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
> Add support to capture and parse raw instruction in objdump.

What's the purpose of using 'objdump' for reading raw instructions ? 
Can't they be read directly without invoking 'objdump' ? It looks odd to 
me to use objdump to provide readable text and then parse it back.

> Currently, the perf tool infrastructure uses "--no-show-raw-insn" option
> with "objdump" while disassemble. Example from powerpc with this option
> for an instruction address is:

Yes and that makes sense because the purpose of objdump is to provide 
human readable annotations, not to perform automated analysis. Am I 
missing something ?

> 
> Snippet from:
> objdump  --start-address= --stop-address=  -d 
> --no-show-raw-insn -C 
> 
> c10224b4: lwz r10,0(r9)
> 
> This line "lwz r10,0(r9)" is parsed to extract instruction name,
> registers names and offset. Also to find whether there is a memory
> reference in the operands, "memory_ref_char" field of objdump is used.
> For x86, "(" is used as memory_ref_char to tackle instructions of the
> form "mov  (%rax), %rcx".
> 
> In case of powerpc, not all instructions using "(" are the only memory
> instructions. Example, above instruction can also be of extended form (X
> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
> and extract the source/target registers, patch adds support to use raw
> instruction. With raw instruction, macros are added to extract opcode
> and register fields.
> 
> "struct ins_operands" and "struct ins" is updated to carry opcode and
> raw instruction binary code (raw_insn). Function "disasm_line__parse"
> is updated to fill the raw instruction hex value and opcode in newly
> added fields. There is no changes in existing code paths, which parses
> the disassembled code. The architecture using the instruction name and
> present approach is not altered. Since this approach targets powerpc,
> the macro implementation is added for powerpc as of now.
> 
> Example:
> representation using --show-raw-insn in objdump gives result:
> 
> 38 01 81 e8 ld  r4,312(r1)
> 
> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
> this translates to instruction form: "ld RT,DS(RA)" and binary code
> as:
> _
> | 58 |  RT  |  RA |  DS   | |
> -
> 06 1116  30 31
> 
> Function "disasm_line__parse" is updated to capture:
> 
> line:38 01 81 e8 ld  r4,312(r1)
> opcode and raw instruction "38 01 81 e8"
> Raw instruction is used later to extract the reg/offset fields.
> 
> Signed-off-by: Athira Rajeev 
> ---


Re: [PATCH v1 1/9] selftests/powerpc/dexcr: Add -no-pie to hashchk tests

2024-05-07 Thread Andrew Donnellan
On Wed, 2024-04-17 at 21:23 +1000, Benjamin Gray wrote:
> The hashchk tests want to verify that the hash key is changed over
> exec.
> It does so by calculating hashes at the same address across an exec.
> This is made simpler by disabling PIE functionality, so we can
> re-execute ourselves and be using the same addresses in the child.
> 
> While -fno-pie is already added, -no-pie is also required.
> 
> Fixes: ca64da7574f8 ("selftests/powerpc/dexcr: Add hashst/hashchk
> test")
> Signed-off-by: Benjamin Gray 

This matches the gcc documentation.

Reviewed-by: Andrew Donnellan 
Tested-by: Andrew Donnellan 

> 
> ---
> 
> This is not related to features introduced in this series, just fixes
> the test added in the static DEXCR series.
> ---
>  tools/testing/selftests/powerpc/dexcr/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/powerpc/dexcr/Makefile
> b/tools/testing/selftests/powerpc/dexcr/Makefile
> index 76210f2bcec3..829ad075b4a4 100644
> --- a/tools/testing/selftests/powerpc/dexcr/Makefile
> +++ b/tools/testing/selftests/powerpc/dexcr/Makefile
> @@ -3,7 +3,7 @@ TEST_GEN_FILES := lsdexcr
>  
>  include ../../lib.mk
>  
> -$(OUTPUT)/hashchk_test: CFLAGS += -fno-pie $(call cc-option,-mno-
> rop-protect)
> +$(OUTPUT)/hashchk_test: CFLAGS += -fno-pie -no-pie $(call cc-
> option,-mno-rop-protect)
>  
>  $(TEST_GEN_PROGS): ../harness.c ../utils.c ./dexcr.c
>  $(TEST_GEN_FILES): ../utils.c ./dexcr.c

-- 
Andrew DonnellanOzLabs, ADL Canberra
a...@linux.ibm.com   IBM Australia Limited