[RFC PATCH 0/2] x86/kprobes: add exception opcode detector and boost more opcodes

2024-01-26 Thread Jinghao Jia
Hi everyone,

This patch set makes the following 2 changes:

- It adds an exception opcode detector to prevent kprobing on INTs and UDs.
  These opcodes serves special purposes in the kernel and kprobing them
  will also cause the stack trace to be polluted by the copy buffer
  address. This is suggested by Masami.

- At the same time, this patch set also boosts more opcodes from the group
  2/3/4/5. The newly boosted opcodes are all arithmetic instructions with
  semantics that are easy to reason about, and therefore, they are able to
  be boosted and executed out-of-line. These instructions were not boosted
  previously because they use opcode extensions that are not handled by the
  kernel. But now with the instruction decoder they can be easily handled.
  Boosting (and further jump optimizing) these instructions leads to a 10x
  performance gain for a single probe on QEMU.

Jinghao Jia (2):
  x86/kprobes: Prohibit kprobing on INT and UD
  x86/kprobes: boost more instructions from grp2/3/4/5

 arch/x86/kernel/kprobes/core.c | 54 ++
 1 file changed, 41 insertions(+), 13 deletions(-)

--
2.43.0




[RFC PATCH 2/2] x86/kprobes: boost more instructions from grp2/3/4/5

2024-01-26 Thread Jinghao Jia
With the instruction decoder, we are now able to decode and recognize
instructions with opcode extensions. There are more instructions in
these groups that can be boosted:

Group 2: ROL, ROR, RCL, RCR, SHL/SAL, SHR, SAR
Group 3: TEST, NOT, NEG, MUL, IMUL, DIV, IDIV
Group 4: INC, DEC (byte operation)
Group 5: INC, DEC (word/doubleword/quadword operation)

These instructions are not boosted previously because there are reserved
opcodes within the groups, e.g., group 2 with ModR/M.nnn == 110 is
unmapped. As a result, kprobes attached to them requires two int3 traps
as being non-boostable also prevents jump-optimization.

Some simple tests on QEMU show that after boosting and jump-optimization
a single kprobe on these instructions with an empty pre-handler runs 10x
faster (~1000 cycles vs. ~100 cycles).

Since these instructions are mostly ALU operations and do not touch
special registers like RIP, let's boost them so that we get the
performance benefit.

Signed-off-by: Jinghao Jia 
---
 arch/x86/kernel/kprobes/core.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 792b38d22126..f847bd9cc91b 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -169,22 +169,31 @@ int can_boost(struct insn *insn, void *addr)
case 0x62:  /* bound */
case 0x70 ... 0x7f: /* Conditional jumps */
case 0x9a:  /* Call far */
-   case 0xc0 ... 0xc1: /* Grp2 */
case 0xcc ... 0xce: /* software exceptions */
-   case 0xd0 ... 0xd3: /* Grp2 */
case 0xd6:  /* (UD) */
case 0xd8 ... 0xdf: /* ESC */
case 0xe0 ... 0xe3: /* LOOP*, JCXZ */
case 0xe8 ... 0xe9: /* near Call, JMP */
case 0xeb:  /* Short JMP */
case 0xf0 ... 0xf4: /* LOCK/REP, HLT */
-   case 0xf6 ... 0xf7: /* Grp3 */
-   case 0xfe:  /* Grp4 */
/* ... are not boostable */
return 0;
+   case 0xc0 ... 0xc1: /* Grp2 */
+   case 0xd0 ... 0xd3: /* Grp2 */
+   /* ModR/M nnn == 110 is reserved */
+   return X86_MODRM_REG(insn->modrm.bytes[0]) != 6;
+   case 0xf6 ... 0xf7: /* Grp3 */
+   /* ModR/M nnn == 001 is reserved */
+   return X86_MODRM_REG(insn->modrm.bytes[0]) != 1;
+   case 0xfe:  /* Grp4 */
+   /* Only inc and dec are boostable */
+   return X86_MODRM_REG(insn->modrm.bytes[0]) == 0 ||
+  X86_MODRM_REG(insn->modrm.bytes[0]) == 1;
case 0xff:  /* Grp5 */
-   /* Only indirect jmp is boostable */
-   return X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
+   /* Only inc, dec, and indirect jmp are boostable */
+   return X86_MODRM_REG(insn->modrm.bytes[0]) == 0 ||
+  X86_MODRM_REG(insn->modrm.bytes[0]) == 1 ||
+  X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
default:
return 1;
}
-- 
2.43.0




[RFC PATCH 1/2] x86/kprobes: Prohibit kprobing on INT and UD

2024-01-26 Thread Jinghao Jia
Both INTs (INT n, INT1, INT3, INTO) and UDs (UD0, UD1, UD2) serve
special purposes in the kernel, e.g., INT3 is used by KGDB and UD2 is
involved in LLVM-KCFI instrumentation. At the same time, attaching
kprobes on these instructions (particularly UDs) will pollute the stack
trace dumped in the kernel ring buffer, since the exception is triggered
in the copy buffer rather than the original location.

Check for INTs and UDs in can_probe and reject any kprobes trying to
attach to these instructions.

Suggested-by: Masami Hiramatsu (Google) 
Signed-off-by: Jinghao Jia 
---
 arch/x86/kernel/kprobes/core.c | 33 ++---
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index e8babebad7b8..792b38d22126 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -252,6 +252,22 @@ unsigned long recover_probed_instruction(kprobe_opcode_t 
*buf, unsigned long add
return __recover_probed_insn(buf, addr);
 }
 
+static inline int is_exception_insn(struct insn *insn)
+{
+   if (insn->opcode.bytes[0] == 0x0f) {
+   /* UD0 / UD1 / UD2 */
+   return insn->opcode.bytes[1] == 0xff ||
+  insn->opcode.bytes[1] == 0xb9 ||
+  insn->opcode.bytes[1] == 0x0b;
+   } else {
+   /* INT3 / INT n / INTO / INT1 */
+   return insn->opcode.bytes[0] == 0xcc ||
+  insn->opcode.bytes[0] == 0xcd ||
+  insn->opcode.bytes[0] == 0xce ||
+  insn->opcode.bytes[0] == 0xf1;
+   }
+}
+
 /* Check if paddr is at an instruction boundary */
 static int can_probe(unsigned long paddr)
 {
@@ -294,6 +310,16 @@ static int can_probe(unsigned long paddr)
 #endif
addr += insn.length;
}
+   __addr = recover_probed_instruction(buf, addr);
+   if (!__addr)
+   return 0;
+
+   if (insn_decode_kernel(, (void *)__addr) < 0)
+   return 0;
+
+   if (is_exception_insn())
+   return 0;
+
if (IS_ENABLED(CONFIG_CFI_CLANG)) {
/*
 * The compiler generates the following instruction sequence
@@ -308,13 +334,6 @@ static int can_probe(unsigned long paddr)
 * Also, these movl and addl are used for showing expected
 * type. So those must not be touched.
 */
-   __addr = recover_probed_instruction(buf, addr);
-   if (!__addr)
-   return 0;
-
-   if (insn_decode_kernel(, (void *)__addr) < 0)
-   return 0;
-
if (insn.opcode.value == 0xBA)
offset = 12;
else if (insn.opcode.value == 0x3)
-- 
2.43.0




Re: [PATCH v6 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

2024-01-26 Thread Google
On Sat, 27 Jan 2024 00:14:05 +0900
Masami Hiramatsu (Google)  wrote:

> On Thu, 25 Jan 2024 15:54:53 +0100
> Jiri Olsa  wrote:
> 
> > On Fri, Jan 12, 2024 at 07:10:50PM +0900, Masami Hiramatsu (Google) wrote:
> > > Hi,
> > > 
> > > Here is the 6th version of the series to re-implement the fprobe on
> > > function-graph tracer. The previous version is;
> > > 
> > > https://lore.kernel.org/all/170290509018.220107.1347127510564358608.stgit@devnote2/
> > > 
> > > This version fixes use-after-unregister bug and arm64 stack unwinding
> > > bug [13/36], add an improvement for multiple interrupts during push
> > > operation[20/36], keep SAVE_REGS until BPF and fprobe_event using
> > > ftrace_regs[26/36], also reorder the patches[30/36][31/36] so that new
> > > fprobe can switch to SAVE_ARGS[32/36] safely.
> > > This series also temporarily adds a DIRECT_CALLS bugfix[1/36], which
> > > should be pushed separatedly as a stable bugfix.
> > > 
> > > There are some TODOs:
> > >  - Add s390x and loongarch support to fprobe (multiple fgraph).
> > >  - Fix to get the symbol address from ftrace entry address on arm64.
> > >(This should be done in BPF trace event)
> > >  - Cleanup code, rename some terms(offset/index) and FGRAPH_TYPE_BITMAP
> > >part should be merged to FGRAPH_TYPE_ARRAY patch.
> > 
> > hi,
> > I'm getting kasan bugs below when running bpf selftests on top of this
> > patchset.. I think it's probably the reason I see failures in some bpf
> > kprobe_multi/fprobe tests
> > 
> > so far I couldn't find the reason.. still checking ;-)
> 
> Thanks for reporting! Have you built the kernel with debuginfo? In that
> case, can you decode the line from the address?
> 
> $ eu-addr2line -fi -e vmlinux ftrace_push_return_trace.isra.0+0x346
> 
> This helps me a lot.

I also got the same KASAN error from intel's test bot:

https://lore.kernel.org/all/202401172217.36e37075-oliver.s...@intel.com/

And another one (it should be different one)

https://lore.kernel.org/all/202401172200.c8731564-oliver.s...@intel.com/

This is a selftest failure on i386. I might break something on 32bit.
Let me check.

Thank you,


> 
> Thank you,
> 
> > 
> > jirka
> > 
> > 
> > ---
> > [  507.585913][  T697] BUG: KASAN: slab-out-of-bounds in 
> > ftrace_push_return_trace.isra.0+0x346/0x370
> > [  507.586747][  T697] Write of size 8 at addr 888148193ff8 by task 
> > test_progs/697
> > [  507.587460][  T697] 
> > [  507.587713][  T697] CPU: 2 PID: 697 Comm: test_progs Tainted: G  
> >  OE  6.7.0+ #309 d8e2cbcdc10865c6eb2d28ed0cbf958842aa75a8
> > [  507.588821][  T697] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), 
> > BIOS 1.16.2-1.fc38 04/01/2014
> > [  507.589681][  T697] Call Trace:
> > [  507.590044][  T697]  
> > [  507.590357][  T697]  dump_stack_lvl+0xf6/0x180
> > [  507.590807][  T697]  print_report+0xc4/0x610
> > [  507.591259][  T697]  ? fixup_red_left+0x5/0x20
> > [  507.591781][  T697]  kasan_report+0xbe/0xf0
> > [  507.592241][  T697]  ? ftrace_push_return_trace.isra.0+0x346/0x370
> > [  507.592928][  T697]  ? ftrace_push_return_trace.isra.0+0x346/0x370
> > [  507.593535][  T697]  ? __pfx_text_poke_loc_init+0x10/0x10
> > [  507.594076][  T697]  ? ftrace_replace_code+0x17a/0x230
> > [  507.594586][  T697]  ftrace_push_return_trace.isra.0+0x346/0x370
> > [  507.595192][  T697]  ? __pfx_text_poke_loc_init+0x10/0x10
> > [  507.595747][  T697]  function_graph_enter_ops+0xbb/0x2d0
> > [  507.596271][  T697]  ? ftrace_replace_code+0x17a/0x230
> > [  507.596784][  T697]  ? __pfx_function_graph_enter_ops+0x10/0x10
> > [  507.597353][  T697]  ? preempt_count_sub+0x14/0xc0
> > [  507.598576][  T697]  ? __pfx_text_poke_loc_init+0x10/0x10
> > [  507.599145][  T697]  ? __pfx_fuse_sync_fs+0x10/0x10
> > [  507.599718][  T697]  ftrace_graph_func+0x142/0x270
> > [  507.600293][  T697]  ? __pfx_text_poke_loc_init+0x10/0x10
> > [  507.600892][  T697]  ? __pfx_fuse_conn_put.part.0+0x10/0x10
> > [  507.601484][  T697]  0xa0560097
> > [  507.602067][  T697]  ? __pfx_fuse_conn_put.part.0+0x10/0x10
> > [  507.602715][  T697]  ? text_poke_loc_init+0x5/0x2e0
> > [  507.603288][  T697]  ? __pfx_fuse_conn_put.part.0+0x10/0x10
> > [  507.603923][  T697]  text_poke_loc_init+0x5/0x2e0
> > [  507.604468][  T697]  ftrace_replace_code+0x17a/0x230
> > [  507.605071][  T697]  ftrace_modify_all_code+0x131/0x1a0
> > [  507.605663][  T697]  ftrace_startup+0x10b/0x210
> > [  507.606200][  T697]  register_ftrace_graph+0x313/0x8a0
> > [  507.606805][  T697]  ? register_ftrace_graph+0x3fe/0x8a0
> > [  507.607427][  T697]  register_fprobe_ips.part.0+0x25a/0x3f0
> > [  507.608090][  T697]  bpf_kprobe_multi_link_attach+0x49e/0x850
> > [  507.608781][  T697]  ? __pfx_bpf_kprobe_multi_link_attach+0x10/0x10
> > [  507.609500][  T697]  ? __debug_check_no_obj_freed+0x1d8/0x3a0
> > [  507.610194][  T697]  ? __fget_light+0x96/0xe0
> > [  507.610741][  T697]  __sys_bpf+0x307a/0x3180
> > [  507.611286][  T697]  ? __pfx___sys_bpf+0x10/0x10
> > [  

RE: [PATCH v2 0/2] Update mce_record tracepoint

2024-01-26 Thread Luck, Tony
> But no, that's not the right question to ask.
>
> It is rather: which bits of information are very relevant to an error
> record and which are transient enough so that they cannot be gathered
> from a system by other means or only gathered in a difficult way, and
> should be part of that record.
>
> The PPIN is not transient but you have to go map ->extcpu to the PPIN so
> adding it to the tracepoint is purely a convenience thing. More or less.
>
> The microcode revision thing I still don't buy but it is already there
> so whateva...
>
> So we'd need a rule hammered out and put there in a prominent place so
> that it is clear what goes into struct mce and what not.

My personal evaluation of the value of these two additions to the trace record:

PPIN: Nice to have. People that send stuff to me are terrible about providing 
surrounding
details. The record already includes CPUID(1).EAX ... so I can at least skip 
the step of
asking them which CPU family/model/stepping they were using). But PPIN
can be recovered (so long as the submitter kept good records about which system
generated the record).

MICROCODE: Must have. Microcode version can be changed at run time. Going
back to the system to check later may not give the correct answer to what was 
active
at the time of the error. Especially for an error reported while a microcode 
update is
waling across the CPUs poking the MSR on each in turn.

-Tony


[syzbot] Monthly trace report (Jan 2024)

2024-01-26 Thread syzbot
Hello trace maintainers/developers,

This is a 31-day syzbot report for the trace subsystem.
All related reports/information can be found at:
https://syzkaller.appspot.com/upstream/s/trace

During the period, 2 new issues were detected and 0 were fixed.
In total, 8 issues are still open and 29 have been fixed so far.

Some of the still happening issues:

Ref Crashes Repro Title
<1> 8174Yes   possible deadlock in task_fork_fair
  https://syzkaller.appspot.com/bug?extid=1a93ee5d329e97cfbaff
<2> 235 Yes   WARNING in format_decode (3)
  https://syzkaller.appspot.com/bug?extid=e2c932aec5c8a6e1d31c
<3> 26  Yes   WARNING in blk_register_tracepoints
  https://syzkaller.appspot.com/bug?extid=c54ded83396afee31eb1
<4> 12  Yes   INFO: task hung in blk_trace_ioctl (4)
  https://syzkaller.appspot.com/bug?extid=ed812ed461471ab17a0c
<5> 5   Yes   WARNING in get_probe_ref
  https://syzkaller.appspot.com/bug?extid=8672dcb9d10011c0a160

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

To disable reminders for individual bugs, reply with the following command:
#syz set  no-reminders

To change bug's subsystems, reply with:
#syz set  subsystems: new-subsystem

You may send multiple commands in a single email message.



Re: [PATCH v7 5/5] dax: add a sysfs knob to control memmap_on_memory behavior

2024-01-26 Thread Alison Schofield
On Wed, Jan 24, 2024 at 12:03:50PM -0800, Vishal Verma wrote:
> Add a sysfs knob for dax devices to control the memmap_on_memory setting
> if the dax device were to be hotplugged as system memory.
> 
> The default memmap_on_memory setting for dax devices originating via
> pmem or hmem is set to 'false' - i.e. no memmap_on_memory semantics, to
> preserve legacy behavior. For dax devices via CXL, the default is on.
> The sysfs control allows the administrator to override the above
> defaults if needed.

Reviewed-by: Alison Schofield 

> 
> Cc: David Hildenbrand 
> Cc: Dan Williams 
> Cc: Dave Jiang 
> Cc: Dave Hansen 
> Cc: Huang Ying 
> Tested-by: Li Zhijian 
> Reviewed-by: Jonathan Cameron 
> Reviewed-by: David Hildenbrand 
> Reviewed-by: Huang, Ying 
> Signed-off-by: Vishal Verma 
> ---
>  drivers/dax/bus.c   | 43 
> +
>  Documentation/ABI/testing/sysfs-bus-dax | 17 +
>  2 files changed, 60 insertions(+)
> 
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 0fd948a4443e..27c86d0ca711 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -1349,6 +1349,48 @@ static ssize_t numa_node_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(numa_node);
>  
> +static ssize_t memmap_on_memory_show(struct device *dev,
> +  struct device_attribute *attr, char *buf)
> +{
> + struct dev_dax *dev_dax = to_dev_dax(dev);
> +
> + return sysfs_emit(buf, "%d\n", dev_dax->memmap_on_memory);
> +}
> +
> +static ssize_t memmap_on_memory_store(struct device *dev,
> +   struct device_attribute *attr,
> +   const char *buf, size_t len)
> +{
> + struct dev_dax *dev_dax = to_dev_dax(dev);
> + bool val;
> + int rc;
> +
> + rc = kstrtobool(buf, );
> + if (rc)
> + return rc;
> +
> + if (val == true && !mhp_supports_memmap_on_memory()) {
> + dev_dbg(dev, "memmap_on_memory is not available\n");
> + return -EOPNOTSUPP;
> + }
> +
> + rc = down_write_killable(_dev_rwsem);
> + if (rc)
> + return rc;
> +
> + if (dev_dax->memmap_on_memory != val && dev->driver &&
> + to_dax_drv(dev->driver)->type == DAXDRV_KMEM_TYPE) {
> + up_write(_dev_rwsem);
> + return -EBUSY;
> + }
> +
> + dev_dax->memmap_on_memory = val;
> + up_write(_dev_rwsem);
> +
> + return len;
> +}
> +static DEVICE_ATTR_RW(memmap_on_memory);
> +
>  static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, 
> int n)
>  {
>   struct device *dev = container_of(kobj, struct device, kobj);
> @@ -1375,6 +1417,7 @@ static struct attribute *dev_dax_attributes[] = {
>   _attr_align.attr,
>   _attr_resource.attr,
>   _attr_numa_node.attr,
> + _attr_memmap_on_memory.attr,
>   NULL,
>  };
>  
> diff --git a/Documentation/ABI/testing/sysfs-bus-dax 
> b/Documentation/ABI/testing/sysfs-bus-dax
> index 6359f7bc9bf4..b34266bfae49 100644
> --- a/Documentation/ABI/testing/sysfs-bus-dax
> +++ b/Documentation/ABI/testing/sysfs-bus-dax
> @@ -134,3 +134,20 @@ KernelVersion:   v5.1
>  Contact: nvd...@lists.linux.dev
>  Description:
>   (RO) The id attribute indicates the region id of a dax region.
> +
> +What:/sys/bus/dax/devices/daxX.Y/memmap_on_memory
> +Date:January, 2024
> +KernelVersion:   v6.8
> +Contact: nvd...@lists.linux.dev
> +Description:
> + (RW) Control the memmap_on_memory setting if the dax device
> + were to be hotplugged as system memory. This determines whether
> + the 'altmap' for the hotplugged memory will be placed on the
> + device being hotplugged (memmap_on_memory=1) or if it will be
> + placed on regular memory (memmap_on_memory=0). This attribute
> + must be set before the device is handed over to the 'kmem'
> + driver (i.e.  hotplugged into system-ram). Additionally, this
> + depends on CONFIG_MHP_MEMMAP_ON_MEMORY, and a globally enabled
> + memmap_on_memory parameter for memory_hotplug. This is
> + typically set on the kernel command line -
> + memory_hotplug.memmap_on_memory set to 'true' or 'force'."
> 
> -- 
> 2.43.0
> 
> 



Re: [RESEND] [PATCH] eventfs: Have inodes have unique inode numbers

2024-01-26 Thread Steven Rostedt
On Fri, 26 Jan 2024 15:24:17 -0500
Mathieu Desnoyers  wrote:

> On 2024-01-26 15:12, Steven Rostedt wrote:
> [...]
> > diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
> > index e1b172c0e091..2187be6d7b23 100644
> > --- a/fs/tracefs/inode.c
> > +++ b/fs/tracefs/inode.c
> > @@ -223,13 +223,41 @@ static const struct inode_operations 
> > tracefs_file_inode_operations = {
> > .setattr= tracefs_setattr,
> >   };
> >   
> > +/* Copied from get_next_ino() but adds allocation for multiple inodes */
> > +#define LAST_INO_BATCH 1024
> > +#define LAST_INO_MASK (~(LAST_INO_BATCH - 1))
> > +static DEFINE_PER_CPU(unsigned int, last_ino);
> > +
> > +unsigned int tracefs_get_next_ino(int files)
> > +{
> > +   unsigned int *p = _cpu_var(last_ino);
> > +   unsigned int res = *p;
> > +
> > +#ifdef CONFIG_SMP
> > +   /* Check if adding files+1 overflows */  
> 
> How does it handle a @files input where:
> 
> * (files+1 > LAST_INO_BATCH) ?
> 
> * (files+1 == LAST_INO_BATCH) ?

Well, this is moot anyway, as Linus hates it.

-- Steve



Re: [PATCH v7 2/5] dax/bus.c: replace several sprintf() with sysfs_emit()

2024-01-26 Thread Alison Schofield
On Wed, Jan 24, 2024 at 12:03:47PM -0800, Vishal Verma wrote:
> There were several places where drivers/dax/bus.c uses 'sprintf' to
> print sysfs data. Since a sysfs_emit() helper is available specifically
> for this purpose, replace all the sprintf() usage for sysfs with
> sysfs_emit() in this file.
>

Reviewed-by: Alison Schofield 


> Cc: Dan Williams 
> Reported-by: Greg Kroah-Hartman 
> Signed-off-by: Vishal Verma 
> ---
>  drivers/dax/bus.c | 32 
>  1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index cb148f74ceda..0fd948a4443e 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -269,7 +269,7 @@ static ssize_t id_show(struct device *dev,
>  {
>   struct dax_region *dax_region = dev_get_drvdata(dev);
>  
> - return sprintf(buf, "%d\n", dax_region->id);
> + return sysfs_emit(buf, "%d\n", dax_region->id);
>  }
>  static DEVICE_ATTR_RO(id);
>  
> @@ -278,8 +278,8 @@ static ssize_t region_size_show(struct device *dev,
>  {
>   struct dax_region *dax_region = dev_get_drvdata(dev);
>  
> - return sprintf(buf, "%llu\n", (unsigned long long)
> - resource_size(_region->res));
> + return sysfs_emit(buf, "%llu\n",
> +   (unsigned long long)resource_size(_region->res));
>  }
>  static struct device_attribute dev_attr_region_size = __ATTR(size, 0444,
>   region_size_show, NULL);
> @@ -289,7 +289,7 @@ static ssize_t region_align_show(struct device *dev,
>  {
>   struct dax_region *dax_region = dev_get_drvdata(dev);
>  
> - return sprintf(buf, "%u\n", dax_region->align);
> + return sysfs_emit(buf, "%u\n", dax_region->align);
>  }
>  static struct device_attribute dev_attr_region_align =
>   __ATTR(align, 0400, region_align_show, NULL);
> @@ -322,7 +322,7 @@ static ssize_t available_size_show(struct device *dev,
>   size = dax_region_avail_size(dax_region);
>   up_read(_region_rwsem);
>  
> - return sprintf(buf, "%llu\n", size);
> + return sysfs_emit(buf, "%llu\n", size);
>  }
>  static DEVICE_ATTR_RO(available_size);
>  
> @@ -340,7 +340,7 @@ static ssize_t seed_show(struct device *dev,
>   if (rc)
>   return rc;
>   seed = dax_region->seed;
> - rc = sprintf(buf, "%s\n", seed ? dev_name(seed) : "");
> + rc = sysfs_emit(buf, "%s\n", seed ? dev_name(seed) : "");
>   up_read(_region_rwsem);
>  
>   return rc;
> @@ -361,7 +361,7 @@ static ssize_t create_show(struct device *dev,
>   if (rc)
>   return rc;
>   youngest = dax_region->youngest;
> - rc = sprintf(buf, "%s\n", youngest ? dev_name(youngest) : "");
> + rc = sysfs_emit(buf, "%s\n", youngest ? dev_name(youngest) : "");
>   up_read(_region_rwsem);
>  
>   return rc;
> @@ -763,7 +763,7 @@ static ssize_t start_show(struct device *dev,
>   dax_range = get_dax_range(dev);
>   if (!dax_range)
>   return -ENXIO;
> - rc = sprintf(buf, "%#llx\n", dax_range->range.start);
> + rc = sysfs_emit(buf, "%#llx\n", dax_range->range.start);
>   put_dax_range();
>  
>   return rc;
> @@ -779,7 +779,7 @@ static ssize_t end_show(struct device *dev,
>   dax_range = get_dax_range(dev);
>   if (!dax_range)
>   return -ENXIO;
> - rc = sprintf(buf, "%#llx\n", dax_range->range.end);
> + rc = sysfs_emit(buf, "%#llx\n", dax_range->range.end);
>   put_dax_range();
>  
>   return rc;
> @@ -795,7 +795,7 @@ static ssize_t pgoff_show(struct device *dev,
>   dax_range = get_dax_range(dev);
>   if (!dax_range)
>   return -ENXIO;
> - rc = sprintf(buf, "%#lx\n", dax_range->pgoff);
> + rc = sysfs_emit(buf, "%#lx\n", dax_range->pgoff);
>   put_dax_range();
>  
>   return rc;
> @@ -969,7 +969,7 @@ static ssize_t size_show(struct device *dev,
>   size = dev_dax_size(dev_dax);
>   up_write(_dev_rwsem);
>  
> - return sprintf(buf, "%llu\n", size);
> + return sysfs_emit(buf, "%llu\n", size);
>  }
>  
>  static bool alloc_is_aligned(struct dev_dax *dev_dax, resource_size_t size)
> @@ -1233,7 +1233,7 @@ static ssize_t align_show(struct device *dev,
>  {
>   struct dev_dax *dev_dax = to_dev_dax(dev);
>  
> - return sprintf(buf, "%d\n", dev_dax->align);
> + return sysfs_emit(buf, "%d\n", dev_dax->align);
>  }
>  
>  static ssize_t dev_dax_validate_align(struct dev_dax *dev_dax)
> @@ -1311,7 +1311,7 @@ static ssize_t target_node_show(struct device *dev,
>  {
>   struct dev_dax *dev_dax = to_dev_dax(dev);
>  
> - return sprintf(buf, "%d\n", dev_dax_target_node(dev_dax));
> + return sysfs_emit(buf, "%d\n", dev_dax_target_node(dev_dax));
>  }
>  static DEVICE_ATTR_RO(target_node);
>  
> @@ -1327,7 +1327,7 @@ static ssize_t resource_show(struct device *dev,
>   else
>   start = dev_dax->ranges[0].range.start;
>  
> - return sprintf(buf, "%#llx\n", start);
> + 

Re: [PATCH v7 1/5] dax/bus.c: replace driver-core lock usage by a local rwsem

2024-01-26 Thread Alison Schofield
On Wed, Jan 24, 2024 at 12:03:46PM -0800, Vishal Verma wrote:
> The dax driver incorrectly used driver-core device locks to protect
> internal dax region and dax device configuration structures. Replace the
> device lock usage with a local rwsem, one each for dax region
> configuration and dax device configuration. As a result of this
> conversion, no device_lock() usage remains in dax/bus.c.
> 

Reviewed-by: Alison Schofield 

> Cc: Dan Williams 
> Reported-by: Greg Kroah-Hartman 
> Signed-off-by: Vishal Verma 
> ---
>  drivers/dax/bus.c | 220 
> ++
>  1 file changed, 157 insertions(+), 63 deletions(-)
> 
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 1ff1ab5fa105..cb148f74ceda 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -12,6 +12,18 @@
>  
>  static DEFINE_MUTEX(dax_bus_lock);
>  
> +/*
> + * All changes to the dax region configuration occur with this lock held
> + * for write.
> + */
> +DECLARE_RWSEM(dax_region_rwsem);
> +
> +/*
> + * All changes to the dax device configuration occur with this lock held
> + * for write.
> + */
> +DECLARE_RWSEM(dax_dev_rwsem);
> +
>  #define DAX_NAME_LEN 30
>  struct dax_id {
>   struct list_head list;
> @@ -180,7 +192,7 @@ static u64 dev_dax_size(struct dev_dax *dev_dax)
>   u64 size = 0;
>   int i;
>  
> - device_lock_assert(_dax->dev);
> + WARN_ON_ONCE(!rwsem_is_locked(_dev_rwsem));
>  
>   for (i = 0; i < dev_dax->nr_range; i++)
>   size += range_len(_dax->ranges[i].range);
> @@ -194,8 +206,15 @@ static int dax_bus_probe(struct device *dev)
>   struct dev_dax *dev_dax = to_dev_dax(dev);
>   struct dax_region *dax_region = dev_dax->region;
>   int rc;
> + u64 size;
>  
> - if (dev_dax_size(dev_dax) == 0 || dev_dax->id < 0)
> + rc = down_read_interruptible(_dev_rwsem);
> + if (rc)
> + return rc;
> + size = dev_dax_size(dev_dax);
> + up_read(_dev_rwsem);
> +
> + if (size == 0 || dev_dax->id < 0)
>   return -ENXIO;
>  
>   rc = dax_drv->probe(dev_dax);
> @@ -283,7 +302,7 @@ static unsigned long long dax_region_avail_size(struct 
> dax_region *dax_region)
>   resource_size_t size = resource_size(_region->res);
>   struct resource *res;
>  
> - device_lock_assert(dax_region->dev);
> + WARN_ON_ONCE(!rwsem_is_locked(_region_rwsem));
>  
>   for_each_dax_region_resource(dax_region, res)
>   size -= resource_size(res);
> @@ -295,10 +314,13 @@ static ssize_t available_size_show(struct device *dev,
>  {
>   struct dax_region *dax_region = dev_get_drvdata(dev);
>   unsigned long long size;
> + int rc;
>  
> - device_lock(dev);
> + rc = down_read_interruptible(_region_rwsem);
> + if (rc)
> + return rc;
>   size = dax_region_avail_size(dax_region);
> - device_unlock(dev);
> + up_read(_region_rwsem);
>  
>   return sprintf(buf, "%llu\n", size);
>  }
> @@ -314,10 +336,12 @@ static ssize_t seed_show(struct device *dev,
>   if (is_static(dax_region))
>   return -EINVAL;
>  
> - device_lock(dev);
> + rc = down_read_interruptible(_region_rwsem);
> + if (rc)
> + return rc;
>   seed = dax_region->seed;
>   rc = sprintf(buf, "%s\n", seed ? dev_name(seed) : "");
> - device_unlock(dev);
> + up_read(_region_rwsem);
>  
>   return rc;
>  }
> @@ -333,14 +357,18 @@ static ssize_t create_show(struct device *dev,
>   if (is_static(dax_region))
>   return -EINVAL;
>  
> - device_lock(dev);
> + rc = down_read_interruptible(_region_rwsem);
> + if (rc)
> + return rc;
>   youngest = dax_region->youngest;
>   rc = sprintf(buf, "%s\n", youngest ? dev_name(youngest) : "");
> - device_unlock(dev);
> + up_read(_region_rwsem);
>  
>   return rc;
>  }
>  
> +static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data);
> +
>  static ssize_t create_store(struct device *dev, struct device_attribute 
> *attr,
>   const char *buf, size_t len)
>  {
> @@ -358,7 +386,9 @@ static ssize_t create_store(struct device *dev, struct 
> device_attribute *attr,
>   if (val != 1)
>   return -EINVAL;
>  
> - device_lock(dev);
> + rc = down_write_killable(_region_rwsem);
> + if (rc)
> + return rc;
>   avail = dax_region_avail_size(dax_region);
>   if (avail == 0)
>   rc = -ENOSPC;
> @@ -369,7 +399,7 @@ static ssize_t create_store(struct device *dev, struct 
> device_attribute *attr,
>   .id = -1,
>   .memmap_on_memory = false,
>   };
> - struct dev_dax *dev_dax = devm_create_dev_dax();
> + struct dev_dax *dev_dax = __devm_create_dev_dax();
>  
>   if (IS_ERR(dev_dax))
>   rc = PTR_ERR(dev_dax);
> @@ -387,7 +417,7 @@ static ssize_t create_store(struct 

Re: [PATCH v2 0/2] Update mce_record tracepoint

2024-01-26 Thread Borislav Petkov
On Fri, Jan 26, 2024 at 08:49:03PM +, Luck, Tony wrote:
> Every patch that adds new code or data structures adds to the kernel
> memory footprint. Each should be considered on its merits. The basic
> question being:
> 
>"Is the new functionality worth the cost?"
> 
> Where does it end? It would end if Linus declared:
> 
>   "Linux is now complete. Stop sending patches".
> 
> I.e. it is never going to end.

No, it's not that - it is the merit thing which determines.

> 1) PPIN
> Cost = 8 bytes.
> Benefit: Emdeds a system identifier into the trace record so there
> can be no ambiguity about which machine generated this error.
> Also definitively indicates which socket on a multi-socket system.
> 
> 2) MICROCODE
> Cost = 4 bytes
> Benefit: Certainty about the microcode version active on the core
> at the time the error was detected.
> 
> RAS = Reliability, Availability, Serviceability
> 
> These changes fall into the serviceability bucket. They make it
> easier to diagnose what went wrong.

So does dmesg. Let's add it to the tracepoint...

But no, that's not the right question to ask.

It is rather: which bits of information are very relevant to an error
record and which are transient enough so that they cannot be gathered
from a system by other means or only gathered in a difficult way, and
should be part of that record.

The PPIN is not transient but you have to go map ->extcpu to the PPIN so
adding it to the tracepoint is purely a convenience thing. More or less.

The microcode revision thing I still don't buy but it is already there
so whateva...

So we'd need a rule hammered out and put there in a prominent place so
that it is clear what goes into struct mce and what not.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



RE: [PATCH v2 0/2] Update mce_record tracepoint

2024-01-26 Thread Luck, Tony
> > Is it so very different to add this to a trace record so that rasdaemon
> > can have feature parity with mcelog(8)?
>
> I knew you were gonna say that. When someone decides that it is
> a splendid idea to add more stuff to struct mce then said someone would
> want it in the tracepoint too.
>
> And then we're back to my original question:
>
> "And where does it end? Stick full dmesg in the tracepoint too?"
>
> Where do you draw the line in the sand and say, no more, especially
> static, fields bloating the trace record should be added and from then
> on, you should go collect the info from that box. Something which you're
> supposed to do anyway.

Every patch that adds new code or data structures adds to the kernel
memory footprint. Each should be considered on its merits. The basic
question being:

   "Is the new functionality worth the cost?"

Where does it end? It would end if Linus declared:

  "Linux is now complete. Stop sending patches".

I.e. it is never going to end.

If somebody posts a patch asking to add the full dmesg to a
tracepoint, I'll stand with you to say: "Not only no, but hell no".

So for Naik's two patches we have:

1) PPIN
Cost = 8 bytes.
Benefit: Emdeds a system identifier into the trace record so there
can be no ambiguity about which machine generated this error.
Also definitively indicates which socket on a multi-socket system.

2) MICROCODE
Cost = 4 bytes
Benefit: Certainty about the microcode version active on the core
at the time the error was detected.

RAS = Reliability, Availability, Serviceability

These changes fall into the serviceability bucket. They make it
easier to diagnose what went wrong.


-Tony


Re: [RESEND] [PATCH] eventfs: Have inodes have unique inode numbers

2024-01-26 Thread Mathieu Desnoyers

On 2024-01-26 15:12, Steven Rostedt wrote:
[...]

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index e1b172c0e091..2187be6d7b23 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -223,13 +223,41 @@ static const struct inode_operations 
tracefs_file_inode_operations = {
.setattr= tracefs_setattr,
  };
  
+/* Copied from get_next_ino() but adds allocation for multiple inodes */

+#define LAST_INO_BATCH 1024
+#define LAST_INO_MASK (~(LAST_INO_BATCH - 1))
+static DEFINE_PER_CPU(unsigned int, last_ino);
+
+unsigned int tracefs_get_next_ino(int files)
+{
+   unsigned int *p = _cpu_var(last_ino);
+   unsigned int res = *p;
+
+#ifdef CONFIG_SMP
+   /* Check if adding files+1 overflows */


How does it handle a @files input where:

* (files+1 > LAST_INO_BATCH) ?

* (files+1 == LAST_INO_BATCH) ?


+   if (unlikely(!res || (res & LAST_INO_MASK) != ((res + files + 1) & 
LAST_INO_MASK))) {
+   static atomic_t shared_last_ino;
+   int next = atomic_add_return(LAST_INO_BATCH, _last_ino);
+
+   res = next - LAST_INO_BATCH;
+   }
+#endif
+
+   res++;
+   /* get_next_ino should not provide a 0 inode number */
+   if (unlikely(!res))
+   res++;


I suspect that bumping this res++ in the 0 case can cause inode range
reservation issues at (files+1 == LAST_INO_BATCH-1).

Thanks,

Mathieu


+   *p = res + files;
+   put_cpu_var(last_ino);
+   return res;
+}


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com




Re: [REGRESSION] v5.13: FS_DAX unavailable on 32-bit ARM

2024-01-26 Thread Arnd Bergmann
On Fri, Jan 26, 2024, at 20:33, Mathieu Desnoyers wrote:
>
> A) I have prepared a patch series introducing cache_is_aliasing() with 
> new Kconfig
> options:
>
>* ARCH_HAS_CACHE_ALIASING
>* ARCH_HAS_CACHE_ALIASING_DYNAMIC
>
> and implemented it for all architectures. The "DYNAMIC" implementation
> implements cache_is_aliasing() as a runtime check, which is what is needed
> on architectures like 32-bit ARM.
>
> With this we can basically narrow down the list of architectures which are
> unsupported by DAX to those which are really affected, without actually 
> solving
> the issue for architectures with virtually aliased dcaches.

The dynamic option should only be required when building for
ARMv6, which is really rare. On an ARMv7-only configuration,
we know that the dcache is non-aliasing, so the compile-time
check should be sufficient.

Even on ARMv6, this could be done as a compile-time choice
by platform, since we mostly know what the chips can do:
bcm2835, imx3, wm8750 and s3c64xx are non-aliasing because
they are limited to 16KB L1 caches, while omap2 and as2500
are aliasing with 32KB caches. With realview/integrator it
depends on the exact CPU that was installed.

 Arnd



[RESEND] [PATCH] eventfs: Have inodes have unique inode numbers

2024-01-26 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

Linus suggested to use the same inode numbers to make it easier to
implement getdents(), as it was creating inodes just for generating a
unique and consistent inode number. Linus suggested to just use the same
inode for all files and directories.

Later it was discovered that having directories with the same inode number
would mess up the "find" command, but Linus found that on 64 bit machines,
there was a hole in the eventfs_inode structure due to alignment that
could be used to store the inode numbers for directories. That fixed the
directory issue, but the files still had their own inode number.

The 'tar' command uses inode numbers for determining uniqueness between
files, which this would break. Currently, tar is broken with tracefs
because all files show a stat of zero size and tar doesn't copy anything.
But because tar cares about inode numbers, there could be other
applications that do too. It's best to have all files have unique inode
numbers.

Copy the get_next_ino() to tracefs_get_next_ino() that takes a "files"
parameter. As eventfs directories have a fixed number of files within
them, the number of inodes needed for the eventfs directory files is known
when the directory is created. The tracefs_get_next_ino() will return a
new inode number but also reserve the next "files" inode numbers that the
caller is free to use. Then when an inode for a file is created, its inode
number will be its parent directory's inode number plus the index into the
file array of that directory, giving each file a unique inode number that
can be retrieved at any time.

Signed-off-by: Steven Rostedt (Google) 
---
[
  Resending because I sent the first one to the wrong mailing list.
]

 fs/tracefs/event_inode.c | 31 +++
 fs/tracefs/inode.c   | 37 ++---
 fs/tracefs/internal.h|  1 +
 3 files changed, 50 insertions(+), 19 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 6b211522a13e..7be7a694b106 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -32,14 +32,11 @@
  */
 static DEFINE_MUTEX(eventfs_mutex);
 
-/* Choose something "unique" ;-) */
-#define EVENTFS_FILE_INODE_INO 0x12c4e37
-
 /* Just try to make something consistent and unique */
-static int eventfs_dir_ino(struct eventfs_inode *ei)
+static int eventfs_dir_ino(struct eventfs_inode *ei, int nr_files)
 {
if (!ei->ino)
-   ei->ino = get_next_ino();
+   ei->ino = tracefs_get_next_ino(nr_files);
 
return ei->ino;
 }
@@ -327,6 +324,7 @@ void eventfs_update_gid(struct dentry *dentry, kgid_t gid)
  * @parent: parent dentry for this file.
  * @data: something that the caller will want to get to later on.
  * @fop: struct file_operations that should be used for this file.
+ * @ino: inode number for this file
  *
  * This function creates a dentry that represents a file in the eventsfs_inode
  * directory. The inode.i_private pointer will point to @data in the open()
@@ -335,7 +333,8 @@ void eventfs_update_gid(struct dentry *dentry, kgid_t gid)
 static struct dentry *create_file(const char *name, umode_t mode,
  struct eventfs_attr *attr,
  struct dentry *parent, void *data,
- const struct file_operations *fop)
+ const struct file_operations *fop,
+ unsigned int ino)
 {
struct tracefs_inode *ti;
struct dentry *dentry;
@@ -363,9 +362,7 @@ static struct dentry *create_file(const char *name, umode_t 
mode,
inode->i_op = _file_inode_operations;
inode->i_fop = fop;
inode->i_private = data;
-
-   /* All files will have the same inode number */
-   inode->i_ino = EVENTFS_FILE_INODE_INO;
+   inode->i_ino = ino;
 
ti = get_tracefs(inode);
ti->flags |= TRACEFS_EVENT_INODE;
@@ -377,12 +374,14 @@ static struct dentry *create_file(const char *name, 
umode_t mode,
 /**
  * create_dir - create a dir in the tracefs filesystem
  * @ei: the eventfs_inode that represents the directory to create
- * @parent: parent dentry for this file.
+ * @parent: parent dentry for this directory.
+ * @nr_files: The number of files (not directories) this directory has
  *
  * This function will create a dentry for a directory represented by
  * a eventfs_inode.
  */
-static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry 
*parent)
+static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry 
*parent,
+int nr_files)
 {
struct tracefs_inode *ti;
struct dentry *dentry;
@@ -404,7 +403,7 @@ static struct dentry *create_dir(struct eventfs_inode *ei, 
struct dentry *parent
inode->i_fop = _file_operations;
 
/* All directories will have the same inode number */
-   inode->i_ino = 

Re: [PATCH 2/4] tracing/user_events: Introduce multi-format events

2024-01-26 Thread Steven Rostedt
On Fri, 26 Jan 2024 11:10:07 -0800
Beau Belgrave  wrote:

> > OK, so the each different event has suffixed name. But this will
> > introduce non C-variable name.
> > 
> > Steve, do you think your library can handle these symbols? It will
> > be something like "event:[1]" as the event name.
> > Personally I like "event.1" style. (of course we need to ensure the
> > user given event name is NOT including such suffix numbers)
> >   
> 
> Just to clarify around events including a suffix number. This is why
> multi-events use "user_events_multi" system name and the single-events
> using just "user_events".
> 
> Even if a user program did include a suffix, the suffix would still get
> appended. An example is "test" vs "test:[0]" using multi-format would
> result in two tracepoints ("test:[0]" and "test:[0]:[1]" respectively
> (assuming these are the first multi-events on the system).
> 
> I'm with you, we really don't want any spoofing or squatting possible.
> By using different system names and always appending the suffix I
> believe covers this.
> 
> Looking forward to hearing Steven's thoughts on this as well.

I'm leaning towards Masami's suggestion to use dots, as that won't conflict
with special characters from bash, as '[' and ']' do.

-- Steve



Re: [PATCH RFC v3 11/35] mm: Allow an arch to hook into folio allocation when VMA is known

2024-01-26 Thread Peter Collingbourne
On Thu, Jan 25, 2024 at 8:43 AM Alexandru Elisei
 wrote:
>
> arm64 uses VM_HIGH_ARCH_0 and VM_HIGH_ARCH_1 for enabling MTE for a VMA.
> When VM_HIGH_ARCH_0, which arm64 renames to VM_MTE, is set for a VMA, and
> the gfp flag __GFP_ZERO is present, the __GFP_ZEROTAGS gfp flag also gets
> set in vma_alloc_zeroed_movable_folio().
>
> Expand this to be more generic by adding an arch hook that modifes the gfp
> flags for an allocation when the VMA is known.
>
> Note that __GFP_ZEROTAGS is ignored by the page allocator unless __GFP_ZERO
> is also set; from that point of view, the current behaviour is unchanged,
> even though the arm64 flag is set in more places.  When arm64 will have
> support to reuse the tag storage for data allocation, the uses of the
> __GFP_ZEROTAGS flag will be expanded to instruct the page allocator to try
> to reserve the corresponding tag storage for the pages being allocated.
>
> The flags returned by arch_calc_vma_gfp() are or'ed with the flags set by
> the caller; this has been done to keep an architecture from modifying the
> flags already set by the core memory management code; this is similar to
> how do_mmap() -> calc_vm_flag_bits() -> arch_calc_vm_flag_bits() has been
> implemented. This can be revisited in the future if there's a need to do
> so.
>
> Signed-off-by: Alexandru Elisei 

This patch also needs to update the non-CONFIG_NUMA definition of
vma_alloc_folio in include/linux/gfp.h to call arch_calc_vma_gfp. See:
https://r.android.com/2849146

Peter



Re: [PATCH v2 0/2] Update mce_record tracepoint

2024-01-26 Thread Borislav Petkov
On Fri, Jan 26, 2024 at 07:15:50PM +, Luck, Tony wrote:
> If deployment of a microcode update across a fleet always went
> flawlessly, life would be simpler. But things can fail. And maybe the
> failure wasn't noticed. Perhaps a node was rebooting when the sysadmin
> pushed the update to the fleet and so missed the deployment. Perhaps
> one core was already acting weird and the microcode update didn't get
> applied to that core.

Yes, and you go collect data from that box. You will have to anyway to
figure out why the microcode didn't update.

> Swapping a hard drive, or hot plugging a NIC isn't very likely
> to correlate with an error reported by the CPU in machine
> check banks.

Ofc it is - coherent probe timeoutting due to problematic insertion
could be reported with a MCE, and so on and so on.

> Is it so very different to add this to a trace record so that rasdaemon
> can have feature parity with mcelog(8)?

I knew you were gonna say that. When someone decides that it is
a splendid idea to add more stuff to struct mce then said someone would
want it in the tracepoint too.

And then we're back to my original question: 

"And where does it end? Stick full dmesg in the tracepoint too?"

Where do you draw the line in the sand and say, no more, especially
static, fields bloating the trace record should be added and from then
on, you should go collect the info from that box. Something which you're
supposed to do anyway.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH] eventfs: Give files a default of PAGE_SIZE size

2024-01-26 Thread Steven Rostedt
On Fri, 26 Jan 2024 11:06:33 -0800
Linus Torvalds  wrote:

> On Fri, 26 Jan 2024 at 10:41, Steven Rostedt  wrote:
> >
> > Fine, but I still plan on sending you the update to give all files unique
> > inode numbers. If it screws up tar, it could possibly screw up something
> > else.  
> 
> Well, that in many ways just regularizes the code, and the dynamic
> inode numbers are actually prettier than the odd fixed date-based one
> you picked. I assume it's your birthdate (although I don't know what
> the directory ino number was).

Yeah, it was. I usually use that when I need a random number. I avoid using
it for passwords though. The odd directory number was the date you pulled
in eventfs ;-)

-- Steve




[REGRESSION] v5.13: FS_DAX unavailable on 32-bit ARM

2024-01-26 Thread Mathieu Desnoyers

Hi,

This commit introduced in v5.13 prevents building FS_DAX on 32-bit ARM,
even on ARMv7 which does not have virtually aliased dcaches:

commit d92576f1167c ("dax: does not work correctly with virtual aliasing 
caches")

It used to work fine before: I have customers using dax over pmem on ARMv7, but
this regression will likely prevent them from upgrading their kernel.

The root of the issue here is the fact that DAX was never designed to handle
virtually aliased dcache (VIVT and VIPT with aliased dcache). It touches the
pages through their linear mapping, which is not consistent with the userspace
mappings on virtually aliased dcaches.

I can see a few ways forward to address this:

A) I have prepared a patch series introducing cache_is_aliasing() with new 
Kconfig
   options:

  * ARCH_HAS_CACHE_ALIASING
  * ARCH_HAS_CACHE_ALIASING_DYNAMIC

and implemented it for all architectures. The "DYNAMIC" implementation
implements cache_is_aliasing() as a runtime check, which is what is needed
on architectures like 32-bit ARM.

With this we can basically narrow down the list of architectures which are
unsupported by DAX to those which are really affected, without actually solving
the issue for architectures with virtually aliased dcaches.

B) Another approach would be to dig into what exactly DAX is doing with the 
linear
   mapping, and try to fix this. I see two options there:

B.1) Either we extend vmap to allow vmap'd pages to be aligned on specific 
multiples,
 and use a coloring trick based on SHMLBA like userspace mappings do for 
all DAX
 internal pages accesses, or

B.2) We introduce flush_dcache_folio() at relevant spots (perhaps dax_flush() 
?) to
 synchronize the linear mapping wrt userspace mappings. (would this work ?)

Any thoughts on how to best move forward with this issue are welcome.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com



RE: [PATCH v2 0/2] Update mce_record tracepoint

2024-01-26 Thread Luck, Tony
> > You've spent enough time with Ashok and Thomas tweaking the Linux
> > microcode driver to know that going back to the machine the next day
> > to ask about microcode version has a bunch of ways to get a wrong
> > answer.
>
> Huh, what does that have to do with this?

If deployment of a microcode update across a fleet always went
flawlessly, life would be simpler. But things can fail. And maybe the
failure wasn't noticed. Perhaps a node was rebooting when the
sysadmin pushed the update to the fleet and so missed the
deployment. Perhaps one core was already acting weird and
the microcode update didn't get applied to that core.

> IIUC, if someone changes something on the system, whether that is
> updating microcode or swapping a harddrive or swapping memory or
> whatever, that invalidates the errors reported, pretty much.
>
> You can't put it all in the trace record, you just can't.

Swapping a hard drive, or hot plugging a NIC isn't very likely
to correlate with an error reported by the CPU in machine
check banks. But microcode can be (and has been) the issue
in enough cases that knowing the version at the time of the
error matters.

You seemed to agree with this argument when the microcode
field was added to "struct mce" back in 2018

fa94d0c6e0f3 ("x86/MCE: Save microcode revision in machine check records")

Is it so very different to add this to a trace record so that rasdaemon
can have feature parity with mcelog(8)?

-Tony


Re: [PATCH 2/4] tracing/user_events: Introduce multi-format events

2024-01-26 Thread Beau Belgrave
On Sat, Jan 27, 2024 at 12:01:04AM +0900, Masami Hiramatsu wrote:
> On Tue, 23 Jan 2024 22:08:42 +
> Beau Belgrave  wrote:
> 
> > Add a register_name (reg_name) to the user_event struct which allows for
> > split naming of events. We now have the name that was used to register
> > within user_events as well as the unique name for the tracepoint. Upon
> > registering events ensure matches based on first the reg_name, followed
> > by the fields and format of the event. This allows for multiple events
> > with the same registered name to have different formats. The underlying
> > tracepoint will have a unique name in the format of {reg_name}:[unique_id].
> > 
> > For example, if both "test u32 value" and "test u64 value" are used with
> > the USER_EVENT_REG_MULTI_FORMAT the system would have 2 unique
> > tracepoints. The dynamic_events file would then show the following:
> >   u:test u64 count
> >   u:test u32 count
> > 
> > The actual tracepoint names look like this:
> >   test:[d5874fdac44]
> >   test:[d5914662cd4]
> > 
> > Both would be under the new user_events_multi system name to prevent the
> > older ABI from being used to squat on multi-formatted events and block
> > their use.
> [...]
> > @@ -1923,6 +1972,39 @@ static int user_event_trace_register(struct 
> > user_event *user)
> > return ret;
> >  }
> >  
> > +static int user_event_set_tp_name(struct user_event *user)
> > +{
> > +   lockdep_assert_held(>group->reg_mutex);
> > +
> > +   if (EVENT_MULTI_FORMAT(user->reg_flags)) {
> > +   char *multi_name;
> > +   int len;
> > +
> > +   len = snprintf(NULL, 0, "%s:[%llx]", user->reg_name,
> > +  user->group->multi_id) + 1;
> > +
> > +   multi_name = kzalloc(len, GFP_KERNEL_ACCOUNT);
> > +
> > +   if (!multi_name)
> > +   return -ENOMEM;
> > +
> > +   snprintf(multi_name, len, "%s:[%llx]", user->reg_name,
> > +user->group->multi_id);
> 
> OK, so the each different event has suffixed name. But this will
> introduce non C-variable name.
> 
> Steve, do you think your library can handle these symbols? It will
> be something like "event:[1]" as the event name.
> Personally I like "event.1" style. (of course we need to ensure the
> user given event name is NOT including such suffix numbers)
> 

Just to clarify around events including a suffix number. This is why
multi-events use "user_events_multi" system name and the single-events
using just "user_events".

Even if a user program did include a suffix, the suffix would still get
appended. An example is "test" vs "test:[0]" using multi-format would
result in two tracepoints ("test:[0]" and "test:[0]:[1]" respectively
(assuming these are the first multi-events on the system).

I'm with you, we really don't want any spoofing or squatting possible.
By using different system names and always appending the suffix I
believe covers this.

Looking forward to hearing Steven's thoughts on this as well.

Thanks,
-Beau

> Thank you.
> 
> -- 
> Masami Hiramatsu (Google) 



Re: [PATCH v2 0/5] PM: domains: Add helpers for multi PM domains to avoid open-coding

2024-01-26 Thread Greg Kroah-Hartman
On Fri, Jan 26, 2024 at 05:12:12PM +0100, Ulf Hansson wrote:
> On Fri, 5 Jan 2024 at 17:01, Ulf Hansson  wrote:
> >
> > Updates in v2:
> > - Ccing Daniel Baluta and Iuliana Prodan the NXP remoteproc patches 
> > to
> > requests help with testing.
> > - Fixed NULL pointer bug in patch1, pointed out by Nikunj.
> > - Added some tested/reviewed-by tags.
> >
> >
> > Attaching/detaching of a device to multiple PM domains has started to 
> > become a
> > common operation for many drivers, typically during ->probe() and 
> > ->remove().
> > In most cases, this has lead to lots of boilerplate code in the drivers.
> >
> > This series adds a pair of helper functions to manage the attach/detach of a
> > device to its multiple PM domains. Moreover, a couple of drivers have been
> > converted to use the new helpers as a proof of concept.
> >
> > Note 1)
> > The changes in the drivers have only been compile tested, while the helpers
> > have been tested along with a couple of local dummy drivers that I have 
> > hacked
> > up to model both genpd providers and genpd consumers.
> >
> > Note 2)
> > I was struggling to make up mind if we should have a separate helper to 
> > attach
> > all available power-domains described in DT, rather than providing "NULL" 
> > to the
> > dev_pm_domain_attach_list(). I decided not to, but please let me know if you
> > prefer the other option.
> >
> > Note 3)
> > For OPP integration, as a follow up I am striving to make the
> > dev_pm_opp_attach_genpd() redundant. Instead I think we should move towards
> > using dev_pm_opp_set_config()->_opp_set_required_devs(), which would allow 
> > us to
> > use the helpers that $subject series is adding.
> >
> > Kind regards
> > Ulf Hansson
> 
> Rafael, Greg, do have any objections to this series or would you be
> okay that I queue this up via my pmdomain tree?

I'll defer to Rafael here.



Re: [PATCH] eventfs: Give files a default of PAGE_SIZE size

2024-01-26 Thread Linus Torvalds
On Fri, 26 Jan 2024 at 10:41, Steven Rostedt  wrote:
>
> Fine, but I still plan on sending you the update to give all files unique
> inode numbers. If it screws up tar, it could possibly screw up something
> else.

Well, that in many ways just regularizes the code, and the dynamic
inode numbers are actually prettier than the odd fixed date-based one
you picked. I assume it's your birthdate (although I don't know what
the directory ino number was).

   Linus



Re: [PATCH v2 0/2] Update mce_record tracepoint

2024-01-26 Thread Borislav Petkov
On Fri, Jan 26, 2024 at 05:10:20PM +, Luck, Tony wrote:
> 12 extra bytes divided by (say) 64GB (a very small server these days, may 
> laptop has that much)
>= 0.0001746%
> 
> We will need 57000 changes like this one before we get to 0.001% :-)

You're forgetting that those 12 bytes repeat per MCE tracepoint logged.
And there's other code which adds more 0.01% here and there, well,
because we can.

> But the key there is keeping the details of the source machine attached to
> the error record. My first contact with machine check debugging is always
> just the raw error record (from mcelog, rasdaemon, or console log).

Yes, that is somewhat sensible reason to have the PPIN together with the
MCE record.

> Knowing which microcode version was loaded on a core *at the time of
> the error* is critical. 

So is the rest of the debug info you're going to need from that machine.
And yet we're not adding that to the tracepoint.

> You've spent enough time with Ashok and Thomas tweaking the Linux
> microcode driver to know that going back to the machine the next day
> to ask about microcode version has a bunch of ways to get a wrong
> answer.

Huh, what does that have to do with this?

IIUC, if someone changes something on the system, whether that is
updating microcode or swapping a harddrive or swapping memory or
whatever, that invalidates the errors reported, pretty much.

You can't put it all in the trace record, you just can't. 

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH] eventfs: Give files a default of PAGE_SIZE size

2024-01-26 Thread Steven Rostedt
On Fri, 26 Jan 2024 10:31:07 -0800
Linus Torvalds  wrote:

> On Fri, 26 Jan 2024 at 10:18, Steven Rostedt  wrote:
> >
> > By following what sysfs does, and give files a default size of PAGE_SIZE,
> > it allows the tar to work. No event file is greater than PAGE_SIZE.  
> 
> No, please. Just don't.
> 
> Nobody has asked for this, and nobody sane should use 'tar' on tracefs anyway.
> 
> It hasn't worked before, so saying "now you can use tar" is just a
> *bad* idea. There is no upside, only downsides, with tar either (a)
> not working at all on older kernels or (b) complaining about how the
> size isn't reliable on newer ones.
> 
> So please. You should *NOT* look at "this makes tar work, albeit badly".
> 
> You should look at whether it improves REAL LOADS. And it doesn't. All
> it does is add a hack for a bad idea. Leave it alone.
>

Fine, but I still plan on sending you the update to give all files unique
inode numbers. If it screws up tar, it could possibly screw up something
else. And all the files use to have unique numbers. They are just not unique
in the current -rc release. You have a point that this would just fix this
release and the older kernels would still be broken, but the identical inode
numbers is something I don't want to later find out breaks something.

-- Steve



Re: [PATCH] eventfs: Give files a default of PAGE_SIZE size

2024-01-26 Thread Linus Torvalds
On Fri, 26 Jan 2024 at 10:18, Steven Rostedt  wrote:
>
> By following what sysfs does, and give files a default size of PAGE_SIZE,
> it allows the tar to work. No event file is greater than PAGE_SIZE.

No, please. Just don't.

Nobody has asked for this, and nobody sane should use 'tar' on tracefs anyway.

It hasn't worked before, so saying "now you can use tar" is just a
*bad* idea. There is no upside, only downsides, with tar either (a)
not working at all on older kernels or (b) complaining about how the
size isn't reliable on newer ones.

So please. You should *NOT* look at "this makes tar work, albeit badly".

You should look at whether it improves REAL LOADS. And it doesn't. All
it does is add a hack for a bad idea. Leave it alone.

   Linus



[PATCH] eventfs: Give files a default of PAGE_SIZE size

2024-01-26 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

The sqlhist utility[1] (or the new trace-cmd sqlhist command[2]) creates
the commands of a synthetic event based on the files in the tracefs/events
directory. When creating these commands for an embedded system, it is
asked to copy the files to the temp directory, tar them up and send them
to the main machine. Ideally, tar could be used directly without copying
them to the /tmp directory. But because the sizes presented by the inodes
are of size '0' the tar just copies zero bytes of the file making it
useless.

By following what sysfs does, and give files a default size of PAGE_SIZE,
it allows the tar to work. No event file is greater than PAGE_SIZE.

Although tar will give an error of:

 tar: events/raw_syscalls/filter: File shrank by 3904 bytes; padding with zeros

Which is consistent to the error it gives to sysfs as well:

~# (cd /sys/ && tar cf - firmware) | tar xf -
 [..]
 tar: firmware/acpi/interrupts/ff_slp_btn: File shrank by 4057 bytes; padding 
with zeros

But only add size if the file can be open for read. It doesn't make sense
for files that are write-only.

[1] https://trace-cmd.org/Documentation/libtracefs/libtracefs-sqlhist.html
[2] https://trace-cmd.org/Documentation/trace-cmd/trace-cmd-sqlhist.1.html

Link: 
https://lore.kernel.org/all/camuhmdu-+rmngwjwphypjvcaoe3no37cu8mslvqepdbyk8q...@mail.gmail.com/

Signed-off-by: Steven Rostedt (Google) 
---
 fs/tracefs/event_inode.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 7be7a694b106..013b8af40a4f 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -363,6 +363,8 @@ static struct dentry *create_file(const char *name, umode_t 
mode,
inode->i_fop = fop;
inode->i_private = data;
inode->i_ino = ino;
+   if (mode & (S_IRUSR | S_IRGRP | S_IROTH))
+   inode->i_size = PAGE_SIZE;
 
ti = get_tracefs(inode);
ti->flags |= TRACEFS_EVENT_INODE;
-- 
2.43.0




Re: [PATCH v2 1/4] remoteproc: Add TEE support

2024-01-26 Thread Mathieu Poirier
On Thu, Jan 18, 2024 at 11:04:30AM +0100, Arnaud Pouliquen wrote:
> From: Arnaud Pouliquen 
> 
> Add a remoteproc TEE (Trusted Execution Environment) device
> that will be probed by the TEE bus. If the associated Trusted
> application is supported on secure part this device offers a client
> interface to load a firmware in the secure part.
> This firmware could be authenticated and decrypted by the secure
> trusted application.
> 
> Signed-off-by: Arnaud Pouliquen 
> ---
>  drivers/remoteproc/Kconfig  |   9 +
>  drivers/remoteproc/Makefile |   1 +
>  drivers/remoteproc/tee_remoteproc.c | 393 
>  include/linux/tee_remoteproc.h  |  99 +++
>  4 files changed, 502 insertions(+)
>  create mode 100644 drivers/remoteproc/tee_remoteproc.c
>  create mode 100644 include/linux/tee_remoteproc.h
> 
> diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
> index 48845dc8fa85..85299606806c 100644
> --- a/drivers/remoteproc/Kconfig
> +++ b/drivers/remoteproc/Kconfig
> @@ -365,6 +365,15 @@ config XLNX_R5_REMOTEPROC
>  
> It's safe to say N if not interested in using RPU r5f cores.
>  
> +
> +config TEE_REMOTEPROC
> + tristate "trusted firmware support by a TEE application"
> + depends on OPTEE
> + help
> +   Support for trusted remote processors firmware. The firmware
> +   authentication and/or decryption are managed by a trusted application.
> +   This can be either built-in or a loadable module.
> +
>  endif # REMOTEPROC
>  
>  endmenu
> diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
> index 91314a9b43ce..fa8daebce277 100644
> --- a/drivers/remoteproc/Makefile
> +++ b/drivers/remoteproc/Makefile
> @@ -36,6 +36,7 @@ obj-$(CONFIG_RCAR_REMOTEPROC)   += rcar_rproc.o
>  obj-$(CONFIG_ST_REMOTEPROC)  += st_remoteproc.o
>  obj-$(CONFIG_ST_SLIM_REMOTEPROC) += st_slim_rproc.o
>  obj-$(CONFIG_STM32_RPROC)+= stm32_rproc.o
> +obj-$(CONFIG_TEE_REMOTEPROC) += tee_remoteproc.o
>  obj-$(CONFIG_TI_K3_DSP_REMOTEPROC)   += ti_k3_dsp_remoteproc.o
>  obj-$(CONFIG_TI_K3_R5_REMOTEPROC)+= ti_k3_r5_remoteproc.o
>  obj-$(CONFIG_XLNX_R5_REMOTEPROC) += xlnx_r5_remoteproc.o
> diff --git a/drivers/remoteproc/tee_remoteproc.c 
> b/drivers/remoteproc/tee_remoteproc.c
> new file mode 100644
> index ..49e1e0caf889
> --- /dev/null
> +++ b/drivers/remoteproc/tee_remoteproc.c
> @@ -0,0 +1,393 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) STMicroelectronics 2023 - All Rights Reserved
> + * Author: Arnaud Pouliquen 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "remoteproc_internal.h"
> +
> +#define MAX_TEE_PARAM_ARRY_MEMBER4
> +
> +/*
> + * Authentication of the firmware and load in the remote processor memory
> + *
> + * [in]  params[0].value.a:  unique 32bit identifier of the remote processor
> + * [in]   params[1].memref:  buffer containing the image of the 
> buffer
> + */
> +#define TA_RPROC_FW_CMD_LOAD_FW  1
> +
> +/*
> + * Start the remote processor
> + *
> + * [in]  params[0].value.a:  unique 32bit identifier of the remote processor
> + */
> +#define TA_RPROC_FW_CMD_START_FW 2
> +
> +/*
> + * Stop the remote processor
> + *
> + * [in]  params[0].value.a:  unique 32bit identifier of the remote processor
> + */
> +#define TA_RPROC_FW_CMD_STOP_FW  3
> +
> +/*
> + * Return the address of the resource table, or 0 if not found
> + * No check is done to verify that the address returned is accessible by
> + * the non secure context. If the resource table is loaded in a protected
> + * memory the access by the non secure context will lead to a data abort.
> + *
> + * [in]  params[0].value.a:  unique 32bit identifier of the remote processor
> + * [out]  params[1].value.a: 32bit LSB resource table memory address
> + * [out]  params[1].value.b: 32bit MSB resource table memory address
> + * [out]  params[2].value.a: 32bit LSB resource table memory size
> + * [out]  params[2].value.b: 32bit MSB resource table memory size
> + */
> +#define TA_RPROC_FW_CMD_GET_RSC_TABLE4
> +
> +/*
> + * Return the address of the core dump
> + *
> + * [in]  params[0].value.a:  unique 32bit identifier of the remote processor
> + * [out] params[1].memref:   address of the core dump image if exist,
> + *   else return Null
> + */
> +#define TA_RPROC_FW_CMD_GET_COREDUMP 5
> +
> +struct tee_rproc_mem {
> + char name[20];
> + void __iomem *cpu_addr;
> + phys_addr_t bus_addr;
> + u32 dev_addr;
> + size_t size;
> +};
> +
> +struct tee_rproc_context {
> + struct list_head sessions;
> + struct tee_context *tee_ctx;
> + struct device *dev;
> +};
> +
> +static struct tee_rproc_context *tee_rproc_ctx;
> +
> +static void prepare_args(struct tee_rproc *trproc, 

Re: [PATCH v4 2/2] remoteproc: enhance rproc_put() for clusters

2024-01-26 Thread Bjorn Andersson
On Wed, Jan 03, 2024 at 02:11:25PM -0800, Tanmay Shah wrote:
> This patch enhances rproc_put() to support remoteproc clusters
> with multiple child nodes as in rproc_get_by_phandle().
> 
> Signed-off-by: Tarak Reddy 
> Signed-off-by: Tanmay Shah 

As described in the first patch, this documents that Tarak first
certified the origin of this patch, then you certify the origin as you
handle the patch.

But according to From: you're the author, so how could Tarak have
certified the origin before you authored the patch?

Either correct the author, or add Co-developed-by, if that's what
happened.

> ---
>  drivers/remoteproc/remoteproc_core.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/remoteproc/remoteproc_core.c 
> b/drivers/remoteproc/remoteproc_core.c
> index 0b3b34085e2f..f276956f2c5c 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -2554,7 +2554,11 @@ EXPORT_SYMBOL(rproc_free);
>   */
>  void rproc_put(struct rproc *rproc)
>  {
> - module_put(rproc->dev.parent->driver->owner);
> + if (rproc->dev.parent->driver)
> + module_put(rproc->dev.parent->driver->owner);
> + else
> + module_put(rproc->dev.parent->parent->driver->owner);
> +

This does however highlight a bug that was introduced by patch 1, please
avoid this by squashing the two patches together (and use
Co-developed-by as needed).

Regards,
Bjorn

>   put_device(>dev);
>  }
>  EXPORT_SYMBOL(rproc_put);
> -- 
> 2.25.1
> 



Re: [PATCH v4 1/2] remoteproc: Make rproc_get_by_phandle() work for clusters

2024-01-26 Thread Bjorn Andersson
On Wed, Jan 03, 2024 at 02:11:24PM -0800, Tanmay Shah wrote:
> From: Mathieu Poirier 
> 
> Multi-cluster remoteproc designs typically have the following DT
> declaration:
> 
>   remoteproc_cluster {
>   compatible = "soc,remoteproc-cluster";
> 
> core0: core0 {
>   compatible = "soc,remoteproc-core"
> memory-region;
> sram;
> };
> 
> core1: core1 {
>   compatible = "soc,remoteproc-core"
> memory-region;
> sram;
> }
> };

The indention of this snippet looks weird in my client, because it
contains a mixture of tabs and spaces. Please clean that up, and while
at it, '_' is not a valid character in DT node names...

> 
> A driver exists for the cluster rather than the individual cores
> themselves so that operation mode and HW specific configurations
> applicable to the cluster can be made.
> 
> Because the driver exists at the cluster level and not the individual
> core level, function rproc_get_by_phandle() fails to return the
> remoteproc associated with the phandled it is called for.
> 
> This patch enhances rproc_get_by_phandle() by looking for the cluster's
> driver when the driver for the immediate remoteproc's parent is not
> found.
> 
> Reported-by: Ben Levinsky 
> Signed-off-by: Mathieu Poirier 

The s-o-b is used to certify the origin of the patch, Mathieu provided
his signature here, then as you handle the patch you need to append your
s-o-b to provide the same certification.

The for appropriate tracking of reality, Mathieu should append his s-o-b
when/if he applies the patch.

TL;DR please add your S-o-b after Mathieu's.


Change itself looks good to me.

Regards,
Bjorn

> ---
>  drivers/remoteproc/remoteproc_core.c | 23 ++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/remoteproc/remoteproc_core.c 
> b/drivers/remoteproc/remoteproc_core.c
> index 695cce218e8c..0b3b34085e2f 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -33,6 +33,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -2112,6 +2113,7 @@ EXPORT_SYMBOL(rproc_detach);
>  struct rproc *rproc_get_by_phandle(phandle phandle)
>  {
>   struct rproc *rproc = NULL, *r;
> + struct device_driver *driver;
>   struct device_node *np;
>  
>   np = of_find_node_by_phandle(phandle);
> @@ -2122,7 +2124,26 @@ struct rproc *rproc_get_by_phandle(phandle phandle)
>   list_for_each_entry_rcu(r, _list, node) {
>   if (r->dev.parent && device_match_of_node(r->dev.parent, np)) {
>   /* prevent underlying implementation from being removed 
> */
> - if (!try_module_get(r->dev.parent->driver->owner)) {
> +
> + /*
> +  * If the remoteproc's parent has a driver, the
> +  * remoteproc is not part of a cluster and we can use
> +  * that driver.
> +  */
> + driver = r->dev.parent->driver;
> +
> + /*
> +  * If the remoteproc's parent does not have a driver,
> +  * look for the driver associated with the cluster.
> +  */
> + if (!driver) {
> + if (r->dev.parent->parent)
> + driver = r->dev.parent->parent->driver;
> + if (!driver)
> + break;
> + }
> +
> + if (!try_module_get(driver->owner)) {
>   dev_err(>dev, "can't get owner\n");
>   break;
>   }
> -- 
> 2.25.1
> 



Re: [PATCH v2 4/4] remoteproc: stm32: Add support of an OP-TEE TA to load the firmware

2024-01-26 Thread Mathieu Poirier
On Thu, Jan 18, 2024 at 11:04:33AM +0100, Arnaud Pouliquen wrote:
> The new TEE remoteproc device is used to manage remote firmware in a
> secure, trusted context. The 'st,stm32mp1-m4-tee' compatibility is
> introduced to delegate the loading of the firmware to the trusted
> execution context. In such cases, the firmware should be signed and
> adhere to the image format defined by the TEE.
> 
> Signed-off-by: Arnaud Pouliquen 
> ---
> V1 to V2 update:
> - remove the select "TEE_REMOTEPROC" in STM32_RPROC config as detected by
>   the kernel test robot:
>  WARNING: unmet direct dependencies detected for TEE_REMOTEPROC
>  Depends on [n]: REMOTEPROC [=y] && OPTEE [=n]
>  Selected by [y]:
>  - STM32_RPROC [=y] && (ARCH_STM32 || COMPILE_TEST [=y]) && REMOTEPROC 
> [=y]
> - Fix initialized trproc variable in  stm32_rproc_probe
> ---
>  drivers/remoteproc/stm32_rproc.c | 149 +--
>  1 file changed, 144 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/remoteproc/stm32_rproc.c 
> b/drivers/remoteproc/stm32_rproc.c
> index fcc0001e2657..cf6a21bac945 100644
> --- a/drivers/remoteproc/stm32_rproc.c
> +++ b/drivers/remoteproc/stm32_rproc.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include "remoteproc_internal.h"
> @@ -49,6 +50,9 @@
>  #define M4_STATE_STANDBY 4
>  #define M4_STATE_CRASH   5
>  
> +/* Remote processor unique identifier aligned with the Trusted Execution 
> Environment definitions */
> +#define STM32_MP1_M4_PROC_ID0
> +
>  struct stm32_syscon {
>   struct regmap *map;
>   u32 reg;
> @@ -90,6 +94,8 @@ struct stm32_rproc {
>   struct stm32_mbox mb[MBOX_NB_MBX];
>   struct workqueue_struct *workqueue;
>   bool hold_boot_smc;
> + bool fw_loaded;
> + struct tee_rproc *trproc;
>   void __iomem *rsc_va;
>  };
>  
> @@ -257,6 +263,91 @@ static int stm32_rproc_release(struct rproc *rproc)
>   return err;
>  }
>  
> +static int stm32_rproc_tee_elf_sanity_check(struct rproc *rproc,
> + const struct firmware *fw)
> +{
> + struct stm32_rproc *ddata = rproc->priv;
> + unsigned int ret = 0;
> +
> + if (rproc->state == RPROC_DETACHED)
> + return 0;
> +
> + ret = tee_rproc_load_fw(ddata->trproc, fw);
> + if (!ret)
> + ddata->fw_loaded = true;
> +
> + return ret;
> +}
> +
> +static int stm32_rproc_tee_elf_load(struct rproc *rproc,
> + const struct firmware *fw)
> +{
> + struct stm32_rproc *ddata = rproc->priv;
> + unsigned int ret;
> +
> + /*
> +  * This function can be called by remote proc for recovery
> +  * without the sanity check. In this case we need to load the firmware
> +  * else nothing done here as the firmware has been preloaded for the
> +  * sanity check to be able to parse it for the resource table.
> +  */

This comment is very confusing - please consider refactoring.  

> + if (ddata->fw_loaded)
> + return 0;
> +

I'm not sure about keeping a flag to indicate the status of the loaded firmware.
It is not done for the non-secure method, I don't see why it would be needed for
the secure one.

> + ret = tee_rproc_load_fw(ddata->trproc, fw);
> + if (ret)
> + return ret;
> + ddata->fw_loaded = true;
> +
> + /* Update the resource table parameters. */
> + if (rproc_tee_get_rsc_table(ddata->trproc)) {
> + /* No resource table: reset the related fields. */
> + rproc->cached_table = NULL;
> + rproc->table_ptr = NULL;
> + rproc->table_sz = 0;
> + }
> +
> + return 0;
> +}
> +
> +static struct resource_table *
> +stm32_rproc_tee_elf_find_loaded_rsc_table(struct rproc *rproc,
> +   const struct firmware *fw)
> +{
> + struct stm32_rproc *ddata = rproc->priv;
> +
> + return tee_rproc_get_loaded_rsc_table(ddata->trproc);
> +}
> +
> +static int stm32_rproc_tee_start(struct rproc *rproc)
> +{
> + struct stm32_rproc *ddata = rproc->priv;
> +
> + return tee_rproc_start(ddata->trproc);
> +}
> +
> +static int stm32_rproc_tee_attach(struct rproc *rproc)
> +{
> + /* Nothing to do, remote proc already started by the secured context. */
> + return 0;
> +}
> +
> +static int stm32_rproc_tee_stop(struct rproc *rproc)
> +{
> + struct stm32_rproc *ddata = rproc->priv;
> + int err;
> +
> + stm32_rproc_request_shutdown(rproc);
> +
> + err = tee_rproc_stop(ddata->trproc);
> + if (err)
> + return err;
> +
> + ddata->fw_loaded = false;
> +
> + return stm32_rproc_release(rproc);
> +}
> +
>  static int stm32_rproc_prepare(struct rproc *rproc)
>  {
>   struct device *dev = rproc->dev.parent;
> @@ -319,7 +410,14 @@ static int stm32_rproc_prepare(struct rproc *rproc)
>  
>  static int stm32_rproc_parse_fw(struct rproc *rproc, const 

RE: [PATCH v2 0/2] Update mce_record tracepoint

2024-01-26 Thread Luck, Tony
> > 8 bytes for PPIN, 4 more for microcode.
>
> I know, nothing leads to bloat like 0.01% here, 0.001% there...

12 extra bytes divided by (say) 64GB (a very small server these days, may 
laptop has that much)
   = 0.0001746%

We will need 57000 changes like this one before we get to 0.001% :-)

> > Number of recoverable machine checks per system  I hope the
> > monthly rate should be countable on my fingers...
>
> That's not the point. Rather, when you look at MCE reports, you pretty
> much almost always go and collect additional information from the target
> machine because you want to figure out what exactly is going on.
>
> So what's stopping you from collecting all that static information
> instead of parrotting it through the tracepoint with each error?

PPIN is static. So with good tracking to keep source platform information
attached to the error record as it gets passed around people trying to triage
the problem, you could say it can be retrieved later (or earlier when setting
up a database of attributes of each machine in the fleet.

But the key there is keeping the details of the source machine attached to
the error record. My first contact with machine check debugging is always
just the raw error record (from mcelog, rasdaemon, or console log).

> > PPIN is useful when talking to the CPU vendor about patterns of
> > similar errors seen across a cluster.
>
> I guess that is perhaps the only thing of the two that makes some sense
> at least - the identifier uniquely describes which CPU the error comes
> from...
>
> > MICROCODE - gives a fast path to root cause problems that have already
> > been fixed in a microcode update.
>
> But that, nah. See above.

Knowing which microcode version was loaded on a core *at the time of the error*
is critical. You've spent enough time with Ashok and Thomas tweaking the Linux
microcode driver to know that going back to the machine the next day to ask 
about
microcode version has a bunch of ways to get a wrong answer.

-Tony


Re: [PATCH RFC v3 19/35] arm64: mte: Discover tag storage memory

2024-01-26 Thread Alexandru Elisei
Hi Krzysztof,

On Fri, Jan 26, 2024 at 09:50:58AM +0100, Krzysztof Kozlowski wrote:
> On 25/01/2024 17:42, Alexandru Elisei wrote:
> > Allow the kernel to get the base address, size, block size and associated
> > memory node for tag storage from the device tree blob.
> > 
> 
> Please use scripts/get_maintainers.pl to get a list of necessary people
> and lists to CC. It might happen, that command when run on an older
> kernel, gives you outdated entries. Therefore please be sure you base
> your patches on recent Linux kernel.
> 
> Tools like b4 or scripts_getmaintainer.pl provide you proper list of
> people, so fix your workflow. Tools might also fail if you work on some
> ancient tree (don't, use mainline), work on fork of kernel (don't, use
> mainline) or you ignore some maintainers (really don't). Just use b4 and
> all the problems go away.
> 
> You missed at least devicetree list (maybe more), so this won't be
> tested by automated tooling. Performing review on untested code might be
> a waste of time, thus I will skip this patch entirely till you follow
> the process allowing the patch to be tested.
> 
> Please kindly resend and include all necessary To/Cc entries.

My mistake, the previous iteration of the series didn't include a
devicetree binding and I forgot to update the To/Cc list. Thank you for the
heads-up, hopefully you can have a look after I resend the series.

> 
> 
> > A tag storage region represents the smallest contiguous memory region that
> > holds all the tags for the associated contiguous memory region which can be
> > tagged. For example, for a 32GB contiguous tagged memory the corresponding
> > tag storage region is exactly 1GB of contiguous memory, not two adjacent
> > 512M of tag storage memory, nor one 2GB tag storage region.
> > 
> > Tag storage is described as reserved memory; future patches will teach the
> > kernel how to make use of it for data (non-tagged) allocations.
> > 
> > Signed-off-by: Alexandru Elisei 
> > ---
> > 
> > Changes since rfc v2:
> > 
> > * Reworked from rfc v2 patch #11 ("arm64: mte: Reserve tag storage memory").
> > * Added device tree schema (Rob Herring)
> > * Tag storage memory is now described in the "reserved-memory" node (Rob
> > Herring).
> > 
> >  .../reserved-memory/arm,mte-tag-storage.yaml  |  78 +
> 
> Please run scripts/checkpatch.pl and fix reported warnings. Some
> warnings can be ignored, but the code here looks like it needs a fix.
> Feel free to get in touch if the warning is not clear.

Thank you for pointing it out, I'll move the binding to a separate patch.

Alex



Re: [PATCH] tracing/trigger: Fix to return error if failed to alloc snapshot

2024-01-26 Thread Steven Rostedt
On Fri, 26 Jan 2024 09:42:58 +0900
"Masami Hiramatsu (Google)"  wrote:

> From: Masami Hiramatsu (Google) 
> 
> Fix register_snapshot_trigger() to return error code if it failed to
> allocate a snapshot instead of 0 (success). Unless that, it will register
> snapshot trigger without an error.
> 
> Fixes: 0bbe7f719985 ("tracing: Fix the race between registering 'snapshot' 
> event trigger and triggering 'snapshot' operation")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Masami Hiramatsu (Google) 

Thanks Masami, I'll apply this.

-- Steve

> ---
>  kernel/trace/trace_events_trigger.c |6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/trace/trace_events_trigger.c 
> b/kernel/trace/trace_events_trigger.c
> index 46439e3bcec4..b33c3861fbbb 100644
> --- a/kernel/trace/trace_events_trigger.c
> +++ b/kernel/trace/trace_events_trigger.c
> @@ -1470,8 +1470,10 @@ register_snapshot_trigger(char *glob,
> struct event_trigger_data *data,
> struct trace_event_file *file)
>  {
> - if (tracing_alloc_snapshot_instance(file->tr) != 0)
> - return 0;
> + int ret = tracing_alloc_snapshot_instance(file->tr);
> +
> + if (ret < 0)
> + return ret;
>  
>   return register_trigger(glob, data, file);
>  }




Re: [PATCH v7 09/15] x86/sgx: Charge mem_cgroup for per-cgroup reclamation

2024-01-26 Thread Haitao Huang

On Fri, 26 Jan 2024 08:37:25 -0600, Huang, Kai  wrote:





Signed-off-by: Haitao Huang 
Reported-by: Mikko Ylinen 
---


Non-technical staff:

I believe checkpatch requires you to have a "Closes" tag after  
"Reported-by"

otherwise it complains something like this:

  WARNING: Reported-by: should be immediately followed by Closes: with a  
URL

  to the report

Not sure how strict this rule is, but seems you forgot to run checkpatch  
so just

a reminder.


Yeah I did remember run checkpatch this time and last time :-)
Here is what I see from documentation[1]:"The tag should be followed by a  
Closes: tag pointing to the report, unless the report is not available on  
the web". So it seems to allow exceptions.
Mikko mentioned this issue to me in private IM so this is not available on  
web.


Thanks
Haitao

https://docs.kernel.org/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes



Re: [PATCH v2 0/5] PM: domains: Add helpers for multi PM domains to avoid open-coding

2024-01-26 Thread Ulf Hansson
On Fri, 5 Jan 2024 at 17:01, Ulf Hansson  wrote:
>
> Updates in v2:
> - Ccing Daniel Baluta and Iuliana Prodan the NXP remoteproc patches to
> requests help with testing.
> - Fixed NULL pointer bug in patch1, pointed out by Nikunj.
> - Added some tested/reviewed-by tags.
>
>
> Attaching/detaching of a device to multiple PM domains has started to become a
> common operation for many drivers, typically during ->probe() and ->remove().
> In most cases, this has lead to lots of boilerplate code in the drivers.
>
> This series adds a pair of helper functions to manage the attach/detach of a
> device to its multiple PM domains. Moreover, a couple of drivers have been
> converted to use the new helpers as a proof of concept.
>
> Note 1)
> The changes in the drivers have only been compile tested, while the helpers
> have been tested along with a couple of local dummy drivers that I have hacked
> up to model both genpd providers and genpd consumers.
>
> Note 2)
> I was struggling to make up mind if we should have a separate helper to attach
> all available power-domains described in DT, rather than providing "NULL" to 
> the
> dev_pm_domain_attach_list(). I decided not to, but please let me know if you
> prefer the other option.
>
> Note 3)
> For OPP integration, as a follow up I am striving to make the
> dev_pm_opp_attach_genpd() redundant. Instead I think we should move towards
> using dev_pm_opp_set_config()->_opp_set_required_devs(), which would allow us 
> to
> use the helpers that $subject series is adding.
>
> Kind regards
> Ulf Hansson

Rafael, Greg, do have any objections to this series or would you be
okay that I queue this up via my pmdomain tree?

Kind regards
Uffe

>
>
> Ulf Hansson (5):
>   PM: domains: Add helper functions to attach/detach multiple PM domains
>   remoteproc: imx_dsp_rproc: Convert to
> dev_pm_domain_attach|detach_list()
>   remoteproc: imx_rproc: Convert to dev_pm_domain_attach|detach_list()
>   remoteproc: qcom_q6v5_adsp: Convert to
> dev_pm_domain_attach|detach_list()
>   media: venus: Convert to dev_pm_domain_attach|detach_list() for vcodec
>
>  drivers/base/power/common.c   | 134 +++
>  drivers/media/platform/qcom/venus/core.c  |  12 +-
>  drivers/media/platform/qcom/venus/core.h  |   7 +-
>  .../media/platform/qcom/venus/pm_helpers.c|  48 ++
>  drivers/remoteproc/imx_dsp_rproc.c|  82 +
>  drivers/remoteproc/imx_rproc.c|  73 +---
>  drivers/remoteproc/qcom_q6v5_adsp.c   | 160 --
>  include/linux/pm_domain.h |  38 +
>  8 files changed, 289 insertions(+), 265 deletions(-)
>
> --
> 2.34.1



Re: [PATCH v2 1/4] remoteproc: Add TEE support

2024-01-26 Thread Mathieu Poirier
On Fri, Jan 26, 2024 at 02:39:33PM +0100, Arnaud POULIQUEN wrote:
> Hi Mathieu,
> 
> On 1/25/24 19:55, Mathieu Poirier wrote:
> > Hi Arnaud,
> > 
> > On Thu, Jan 18, 2024 at 11:04:30AM +0100, Arnaud Pouliquen wrote:
> >> From: Arnaud Pouliquen 
> >>
> >> Add a remoteproc TEE (Trusted Execution Environment) device
> > 
> > Device or driver?  Seems to be the latter...
> 
> Right, driver is better
> 
> > 
> >> that will be probed by the TEE bus. If the associated Trusted
> >> application is supported on secure part this device offers a client
> >> interface to load a firmware in the secure part.
> >> This firmware could be authenticated and decrypted by the secure
> >> trusted application.
> >>
> >> Signed-off-by: Arnaud Pouliquen 
> >> ---
> >>  drivers/remoteproc/Kconfig  |   9 +
> >>  drivers/remoteproc/Makefile |   1 +
> >>  drivers/remoteproc/tee_remoteproc.c | 393 
> >>  include/linux/tee_remoteproc.h  |  99 +++
> >>  4 files changed, 502 insertions(+)
> >>  create mode 100644 drivers/remoteproc/tee_remoteproc.c
> >>  create mode 100644 include/linux/tee_remoteproc.h
> >>
> >> diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
> >> index 48845dc8fa85..85299606806c 100644
> >> --- a/drivers/remoteproc/Kconfig
> >> +++ b/drivers/remoteproc/Kconfig
> >> @@ -365,6 +365,15 @@ config XLNX_R5_REMOTEPROC
> >>  
> >>  It's safe to say N if not interested in using RPU r5f cores.
> >>  
> >> +
> >> +config TEE_REMOTEPROC
> >> +  tristate "trusted firmware support by a TEE application"
> >> +  depends on OPTEE
> >> +  help
> >> +Support for trusted remote processors firmware. The firmware
> >> +authentication and/or decryption are managed by a trusted application.
> >> +This can be either built-in or a loadable module.
> >> +
> >>  endif # REMOTEPROC
> >>  
> >>  endmenu
> >> diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
> >> index 91314a9b43ce..fa8daebce277 100644
> >> --- a/drivers/remoteproc/Makefile
> >> +++ b/drivers/remoteproc/Makefile
> >> @@ -36,6 +36,7 @@ obj-$(CONFIG_RCAR_REMOTEPROC)+= rcar_rproc.o
> >>  obj-$(CONFIG_ST_REMOTEPROC)   += st_remoteproc.o
> >>  obj-$(CONFIG_ST_SLIM_REMOTEPROC)  += st_slim_rproc.o
> >>  obj-$(CONFIG_STM32_RPROC) += stm32_rproc.o
> >> +obj-$(CONFIG_TEE_REMOTEPROC)  += tee_remoteproc.o
> >>  obj-$(CONFIG_TI_K3_DSP_REMOTEPROC)+= ti_k3_dsp_remoteproc.o
> >>  obj-$(CONFIG_TI_K3_R5_REMOTEPROC) += ti_k3_r5_remoteproc.o
> >>  obj-$(CONFIG_XLNX_R5_REMOTEPROC)  += xlnx_r5_remoteproc.o
> >> diff --git a/drivers/remoteproc/tee_remoteproc.c 
> >> b/drivers/remoteproc/tee_remoteproc.c
> >> new file mode 100644
> >> index ..49e1e0caf889
> >> --- /dev/null
> >> +++ b/drivers/remoteproc/tee_remoteproc.c
> >> @@ -0,0 +1,393 @@
> >> +// SPDX-License-Identifier: GPL-2.0-or-later
> >> +/*
> >> + * Copyright (C) STMicroelectronics 2023 - All Rights Reserved
> >> + * Author: Arnaud Pouliquen 
> >> + */
> >> +
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +
> >> +#include "remoteproc_internal.h"
> >> +
> >> +#define MAX_TEE_PARAM_ARRY_MEMBER 4
> >> +
> >> +/*
> >> + * Authentication of the firmware and load in the remote processor memory
> >> + *
> >> + * [in]  params[0].value.a:   unique 32bit identifier of the remote 
> >> processor
> >> + * [in]params[1].memref:  buffer containing the image of the 
> >> buffer
> >> + */
> >> +#define TA_RPROC_FW_CMD_LOAD_FW   1
> >> +
> >> +/*
> >> + * Start the remote processor
> >> + *
> >> + * [in]  params[0].value.a:   unique 32bit identifier of the remote 
> >> processor
> >> + */
> >> +#define TA_RPROC_FW_CMD_START_FW  2
> >> +
> >> +/*
> >> + * Stop the remote processor
> >> + *
> >> + * [in]  params[0].value.a:   unique 32bit identifier of the remote 
> >> processor
> >> + */
> >> +#define TA_RPROC_FW_CMD_STOP_FW   3
> >> +
> >> +/*
> >> + * Return the address of the resource table, or 0 if not found
> >> + * No check is done to verify that the address returned is accessible by
> >> + * the non secure context. If the resource table is loaded in a protected
> >> + * memory the access by the non secure context will lead to a data abort.
> >> + *
> >> + * [in]  params[0].value.a:   unique 32bit identifier of the remote 
> >> processor
> >> + * [out]  params[1].value.a:  32bit LSB resource table memory address
> >> + * [out]  params[1].value.b:  32bit MSB resource table memory address
> >> + * [out]  params[2].value.a:  32bit LSB resource table memory size
> >> + * [out]  params[2].value.b:  32bit MSB resource table memory size
> >> + */
> >> +#define TA_RPROC_FW_CMD_GET_RSC_TABLE 4
> >> +
> >> +/*
> >> + * Return the address of the core dump
> >> + *
> >> + * [in]  params[0].value.a:   

Re: [PATCH v6 32/36] fprobe: Rewrite fprobe on function-graph tracer

2024-01-26 Thread Google
On Thu, 25 Jan 2024 16:11:51 +0100
Jiri Olsa  wrote:

> On Fri, Jan 12, 2024 at 07:17:06PM +0900, Masami Hiramatsu (Google) wrote:
> 
> SNIP
> 
> >   * Register @fp to ftrace for enabling the probe on the address given by 
> > @addrs.
> > @@ -298,23 +547,27 @@ EXPORT_SYMBOL_GPL(register_fprobe);
> >   */
> >  int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num)
> >  {
> > -   int ret;
> > -
> > -   if (!fp || !addrs || num <= 0)
> > -   return -EINVAL;
> > -
> > -   fprobe_init(fp);
> > +   struct fprobe_hlist *hlist_array;
> > +   int ret, i;
> >  
> > -   ret = ftrace_set_filter_ips(>ops, addrs, num, 0, 0);
> > +   ret = fprobe_init(fp, addrs, num);
> > if (ret)
> > return ret;
> >  
> > -   ret = fprobe_init_rethook(fp, num);
> > -   if (!ret)
> > -   ret = register_ftrace_function(>ops);
> > +   mutex_lock(_mutex);
> > +
> > +   hlist_array = fp->hlist_array;
> > +   ret = fprobe_graph_add_ips(addrs, num);
> 
> so fprobe_graph_add_ips registers the ftrace_ops and actually starts
> the tracing.. and in the code below we prepare fprobe data that is
> checked in the ftrace_ops callback.. should we do this this earlier
> before calling fprobe_graph_add_ips/register_ftrace_graph?

Good catch! but I think this is safe because fprobe_entry() checks
the fprobe_ip_table[] (insert_fprobe_node updates it) at first.
Thus until the hash table is updated, fprobe_entry() handler will
return soon.


static int fprobe_entry(unsigned long func, unsigned long ret_ip,
struct ftrace_regs *fregs, struct fgraph_ops *gops)
{
struct fprobe_hlist_node *node, *first;
unsigned long *fgraph_data = NULL;
unsigned long header;
int reserved_words;
struct fprobe *fp;
int used, ret;

if (WARN_ON_ONCE(!fregs))
return 0;

first = node = find_first_fprobe_node(func);
if (unlikely(!first))
return 0;


The fprobe_table (add_fprobe_hash updates it) is used for return handler,
that will be used by fprobe_return().

So I think it should be safe too. Or I might missed something?

Thank you,


> 
> jirka
> 
> > +   if (!ret) {
> > +   add_fprobe_hash(fp);
> > +   for (i = 0; i < hlist_array->size; i++)
> > +   insert_fprobe_node(_array->array[i]);
> > +   }
> > +   mutex_unlock(_mutex);
> >  
> > if (ret)
> > fprobe_fail_cleanup(fp);
> > +
> > return ret;
> >  }
> >  EXPORT_SYMBOL_GPL(register_fprobe_ips);
> > @@ -352,14 +605,13 @@ EXPORT_SYMBOL_GPL(register_fprobe_syms);
> 
> SNIP


-- 
Masami Hiramatsu (Google) 



Re: [PATCH v6 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

2024-01-26 Thread Google
On Thu, 25 Jan 2024 15:54:53 +0100
Jiri Olsa  wrote:

> On Fri, Jan 12, 2024 at 07:10:50PM +0900, Masami Hiramatsu (Google) wrote:
> > Hi,
> > 
> > Here is the 6th version of the series to re-implement the fprobe on
> > function-graph tracer. The previous version is;
> > 
> > https://lore.kernel.org/all/170290509018.220107.1347127510564358608.stgit@devnote2/
> > 
> > This version fixes use-after-unregister bug and arm64 stack unwinding
> > bug [13/36], add an improvement for multiple interrupts during push
> > operation[20/36], keep SAVE_REGS until BPF and fprobe_event using
> > ftrace_regs[26/36], also reorder the patches[30/36][31/36] so that new
> > fprobe can switch to SAVE_ARGS[32/36] safely.
> > This series also temporarily adds a DIRECT_CALLS bugfix[1/36], which
> > should be pushed separatedly as a stable bugfix.
> > 
> > There are some TODOs:
> >  - Add s390x and loongarch support to fprobe (multiple fgraph).
> >  - Fix to get the symbol address from ftrace entry address on arm64.
> >(This should be done in BPF trace event)
> >  - Cleanup code, rename some terms(offset/index) and FGRAPH_TYPE_BITMAP
> >part should be merged to FGRAPH_TYPE_ARRAY patch.
> 
> hi,
> I'm getting kasan bugs below when running bpf selftests on top of this
> patchset.. I think it's probably the reason I see failures in some bpf
> kprobe_multi/fprobe tests
> 
> so far I couldn't find the reason.. still checking ;-)

Thanks for reporting! Have you built the kernel with debuginfo? In that
case, can you decode the line from the address?

$ eu-addr2line -fi -e vmlinux ftrace_push_return_trace.isra.0+0x346

This helps me a lot.

Thank you,

> 
> jirka
> 
> 
> ---
> [  507.585913][  T697] BUG: KASAN: slab-out-of-bounds in 
> ftrace_push_return_trace.isra.0+0x346/0x370
> [  507.586747][  T697] Write of size 8 at addr 888148193ff8 by task 
> test_progs/697
> [  507.587460][  T697] 
> [  507.587713][  T697] CPU: 2 PID: 697 Comm: test_progs Tainted: G   
> OE  6.7.0+ #309 d8e2cbcdc10865c6eb2d28ed0cbf958842aa75a8
> [  507.588821][  T697] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), 
> BIOS 1.16.2-1.fc38 04/01/2014
> [  507.589681][  T697] Call Trace:
> [  507.590044][  T697]  
> [  507.590357][  T697]  dump_stack_lvl+0xf6/0x180
> [  507.590807][  T697]  print_report+0xc4/0x610
> [  507.591259][  T697]  ? fixup_red_left+0x5/0x20
> [  507.591781][  T697]  kasan_report+0xbe/0xf0
> [  507.592241][  T697]  ? ftrace_push_return_trace.isra.0+0x346/0x370
> [  507.592928][  T697]  ? ftrace_push_return_trace.isra.0+0x346/0x370
> [  507.593535][  T697]  ? __pfx_text_poke_loc_init+0x10/0x10
> [  507.594076][  T697]  ? ftrace_replace_code+0x17a/0x230
> [  507.594586][  T697]  ftrace_push_return_trace.isra.0+0x346/0x370
> [  507.595192][  T697]  ? __pfx_text_poke_loc_init+0x10/0x10
> [  507.595747][  T697]  function_graph_enter_ops+0xbb/0x2d0
> [  507.596271][  T697]  ? ftrace_replace_code+0x17a/0x230
> [  507.596784][  T697]  ? __pfx_function_graph_enter_ops+0x10/0x10
> [  507.597353][  T697]  ? preempt_count_sub+0x14/0xc0
> [  507.598576][  T697]  ? __pfx_text_poke_loc_init+0x10/0x10
> [  507.599145][  T697]  ? __pfx_fuse_sync_fs+0x10/0x10
> [  507.599718][  T697]  ftrace_graph_func+0x142/0x270
> [  507.600293][  T697]  ? __pfx_text_poke_loc_init+0x10/0x10
> [  507.600892][  T697]  ? __pfx_fuse_conn_put.part.0+0x10/0x10
> [  507.601484][  T697]  0xa0560097
> [  507.602067][  T697]  ? __pfx_fuse_conn_put.part.0+0x10/0x10
> [  507.602715][  T697]  ? text_poke_loc_init+0x5/0x2e0
> [  507.603288][  T697]  ? __pfx_fuse_conn_put.part.0+0x10/0x10
> [  507.603923][  T697]  text_poke_loc_init+0x5/0x2e0
> [  507.604468][  T697]  ftrace_replace_code+0x17a/0x230
> [  507.605071][  T697]  ftrace_modify_all_code+0x131/0x1a0
> [  507.605663][  T697]  ftrace_startup+0x10b/0x210
> [  507.606200][  T697]  register_ftrace_graph+0x313/0x8a0
> [  507.606805][  T697]  ? register_ftrace_graph+0x3fe/0x8a0
> [  507.607427][  T697]  register_fprobe_ips.part.0+0x25a/0x3f0
> [  507.608090][  T697]  bpf_kprobe_multi_link_attach+0x49e/0x850
> [  507.608781][  T697]  ? __pfx_bpf_kprobe_multi_link_attach+0x10/0x10
> [  507.609500][  T697]  ? __debug_check_no_obj_freed+0x1d8/0x3a0
> [  507.610194][  T697]  ? __fget_light+0x96/0xe0
> [  507.610741][  T697]  __sys_bpf+0x307a/0x3180
> [  507.611286][  T697]  ? __pfx___sys_bpf+0x10/0x10
> [  507.611838][  T697]  ? __kasan_slab_free+0x12d/0x1c0
> [  507.612434][  T697]  ? audit_log_exit+0x8e0/0x1960
> [  507.613003][  T697]  ? kmem_cache_free+0x19d/0x460
> [  507.613644][  T697]  ? rcu_is_watching+0x34/0x60
> [  507.614202][  T697]  ? lockdep_hardirqs_on_prepare+0xe/0x250
> [  507.614865][  T697]  ? 
> seqcount_lockdep_reader_access.constprop.0+0x105/0x120
> [  507.615662][  T697]  ? 
> seqcount_lockdep_reader_access.constprop.0+0xb2/0x120
> [  507.616431][  T697]  __x64_sys_bpf+0x44/0x60
> [  507.616940][  T697]  do_syscall_64+0x87/0x1b0
> [  507.617495][  T697]  

Re: [PATCH 2/4] tracing/user_events: Introduce multi-format events

2024-01-26 Thread Google
On Tue, 23 Jan 2024 22:08:42 +
Beau Belgrave  wrote:

> Add a register_name (reg_name) to the user_event struct which allows for
> split naming of events. We now have the name that was used to register
> within user_events as well as the unique name for the tracepoint. Upon
> registering events ensure matches based on first the reg_name, followed
> by the fields and format of the event. This allows for multiple events
> with the same registered name to have different formats. The underlying
> tracepoint will have a unique name in the format of {reg_name}:[unique_id].
> 
> For example, if both "test u32 value" and "test u64 value" are used with
> the USER_EVENT_REG_MULTI_FORMAT the system would have 2 unique
> tracepoints. The dynamic_events file would then show the following:
>   u:test u64 count
>   u:test u32 count
> 
> The actual tracepoint names look like this:
>   test:[d5874fdac44]
>   test:[d5914662cd4]
> 
> Both would be under the new user_events_multi system name to prevent the
> older ABI from being used to squat on multi-formatted events and block
> their use.
[...]
> @@ -1923,6 +1972,39 @@ static int user_event_trace_register(struct user_event 
> *user)
>   return ret;
>  }
>  
> +static int user_event_set_tp_name(struct user_event *user)
> +{
> + lockdep_assert_held(>group->reg_mutex);
> +
> + if (EVENT_MULTI_FORMAT(user->reg_flags)) {
> + char *multi_name;
> + int len;
> +
> + len = snprintf(NULL, 0, "%s:[%llx]", user->reg_name,
> +user->group->multi_id) + 1;
> +
> + multi_name = kzalloc(len, GFP_KERNEL_ACCOUNT);
> +
> + if (!multi_name)
> + return -ENOMEM;
> +
> + snprintf(multi_name, len, "%s:[%llx]", user->reg_name,
> +  user->group->multi_id);

OK, so the each different event has suffixed name. But this will
introduce non C-variable name.

Steve, do you think your library can handle these symbols? It will
be something like "event:[1]" as the event name.
Personally I like "event.1" style. (of course we need to ensure the
user given event name is NOT including such suffix numbers)

Thank you.

-- 
Masami Hiramatsu (Google) 



Re: [PATCH v7 09/15] x86/sgx: Charge mem_cgroup for per-cgroup reclamation

2024-01-26 Thread Huang, Kai

> 
> Signed-off-by: Haitao Huang 
> Reported-by: Mikko Ylinen 
> ---

Non-technical staff:

I believe checkpatch requires you to have a "Closes" tag after "Reported-by"
otherwise it complains something like this:

  WARNING: Reported-by: should be immediately followed by Closes: with a URL
  to the report

Not sure how strict this rule is, but seems you forgot to run checkpatch so just
a reminder.


Re: [PATCH v2 2/4] dt-bindings: remoteproc: Add compatibility for TEE support

2024-01-26 Thread Arnaud POULIQUEN
Hello Krzysztof,

On 1/26/24 12:03, Krzysztof Kozlowski wrote:
> On 18/01/2024 11:04, Arnaud Pouliquen wrote:
>> The "st,stm32mp1-m4-tee" compatible is utilized in a system configuration
>> where the Cortex-M4 firmware is loaded by the Trusted execution Environment
>> (TEE).
>> For instance, this compatible is used in both the Linux and OP-TEE
>> device-tree:
>> - In OP-TEE, a node is defined in the device tree with the
>>   st,stm32mp1-m4-tee to support signed remoteproc firmware.
>>   Based on DT properties, OP-TEE authenticates, loads, starts, and stops
>>   the firmware.
>> - On Linux, when the compatibility is set, the Cortex-M resets should not
>>   be declared in the device tree.
>>
>> Signed-off-by: Arnaud Pouliquen 
>> ---
>> V1 to V2 updates
>> - update "st,stm32mp1-m4" compatible description to generalize
>> - remove the 'reset-names' requirement in one conditional branch, as the
>>   property is already part of the condition test.
>> ---
>>  .../bindings/remoteproc/st,stm32-rproc.yaml   | 52 +++
>>  1 file changed, 43 insertions(+), 9 deletions(-)
>>
>> diff --git 
>> a/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml 
>> b/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml
>> index 370af61d8f28..6af821b15736 100644
>> --- a/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml
>> +++ b/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml
>> @@ -16,7 +16,12 @@ maintainers:
>>  
>>  properties:
>>compatible:
>> -const: st,stm32mp1-m4
>> +enum:
>> +  - st,stm32mp1-m4
>> +  - st,stm32mp1-m4-tee
> 
> The patch looks good to me, but I wonder about this choice of two
> compatibles.
> 
> Basically this is the same hardware with the same interface, but two
> compatibles to differentiate a bit different firmware setup. We have
> already such cases for Qualcomm [1] [2] and new ones will be coming. [3]
> 
> I wonder whether this should be rather the same compatible with
> additional property, e.g. "st,tee-control" or "remote-control".

Yes the point is valid, I asked myself the question.

I proposed a compatibility solution for one main reason. On the STM32MP15, if
the firmware is loaded by Linux, no driver is probed in OP-TEE. But if the
firmware is authenticated and loaded by OP-TEE, a Op-TEE driver is probed to
manage memory access rights.

The drawback of a property is that we would need to probe the OP-TEE driver for
the STM32MP1 platform even if it is not used, just to check this property.

Thanks,
Arnaud

> 
> [1]
> https://elixir.bootlin.com/linux/v6.7.1/source/Documentation/devicetree/bindings/dma/qcom,bam-dma.yaml#L54
> 
> [2]
> https://elixir.bootlin.com/linux/v6.7.1/source/Documentation/devicetree/bindings/net/qcom,ipa.yaml#L129
> (that's a bit different)
> 
> [3] https://lore.kernel.org/linux-devicetree/20240124103623.GJ4906@thinkpad/
> 
> @Rob,
> Any general guidance for this and Qualcomm?
> 
> Best regards,
> Krzysztof
> 



Re: [PATCH v2 1/4] remoteproc: Add TEE support

2024-01-26 Thread Arnaud POULIQUEN
Hi Mathieu,

On 1/25/24 19:55, Mathieu Poirier wrote:
> Hi Arnaud,
> 
> On Thu, Jan 18, 2024 at 11:04:30AM +0100, Arnaud Pouliquen wrote:
>> From: Arnaud Pouliquen 
>>
>> Add a remoteproc TEE (Trusted Execution Environment) device
> 
> Device or driver?  Seems to be the latter...

Right, driver is better

> 
>> that will be probed by the TEE bus. If the associated Trusted
>> application is supported on secure part this device offers a client
>> interface to load a firmware in the secure part.
>> This firmware could be authenticated and decrypted by the secure
>> trusted application.
>>
>> Signed-off-by: Arnaud Pouliquen 
>> ---
>>  drivers/remoteproc/Kconfig  |   9 +
>>  drivers/remoteproc/Makefile |   1 +
>>  drivers/remoteproc/tee_remoteproc.c | 393 
>>  include/linux/tee_remoteproc.h  |  99 +++
>>  4 files changed, 502 insertions(+)
>>  create mode 100644 drivers/remoteproc/tee_remoteproc.c
>>  create mode 100644 include/linux/tee_remoteproc.h
>>
>> diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
>> index 48845dc8fa85..85299606806c 100644
>> --- a/drivers/remoteproc/Kconfig
>> +++ b/drivers/remoteproc/Kconfig
>> @@ -365,6 +365,15 @@ config XLNX_R5_REMOTEPROC
>>  
>>It's safe to say N if not interested in using RPU r5f cores.
>>  
>> +
>> +config TEE_REMOTEPROC
>> +tristate "trusted firmware support by a TEE application"
>> +depends on OPTEE
>> +help
>> +  Support for trusted remote processors firmware. The firmware
>> +  authentication and/or decryption are managed by a trusted application.
>> +  This can be either built-in or a loadable module.
>> +
>>  endif # REMOTEPROC
>>  
>>  endmenu
>> diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
>> index 91314a9b43ce..fa8daebce277 100644
>> --- a/drivers/remoteproc/Makefile
>> +++ b/drivers/remoteproc/Makefile
>> @@ -36,6 +36,7 @@ obj-$(CONFIG_RCAR_REMOTEPROC)  += rcar_rproc.o
>>  obj-$(CONFIG_ST_REMOTEPROC) += st_remoteproc.o
>>  obj-$(CONFIG_ST_SLIM_REMOTEPROC)+= st_slim_rproc.o
>>  obj-$(CONFIG_STM32_RPROC)   += stm32_rproc.o
>> +obj-$(CONFIG_TEE_REMOTEPROC)+= tee_remoteproc.o
>>  obj-$(CONFIG_TI_K3_DSP_REMOTEPROC)  += ti_k3_dsp_remoteproc.o
>>  obj-$(CONFIG_TI_K3_R5_REMOTEPROC)   += ti_k3_r5_remoteproc.o
>>  obj-$(CONFIG_XLNX_R5_REMOTEPROC)+= xlnx_r5_remoteproc.o
>> diff --git a/drivers/remoteproc/tee_remoteproc.c 
>> b/drivers/remoteproc/tee_remoteproc.c
>> new file mode 100644
>> index ..49e1e0caf889
>> --- /dev/null
>> +++ b/drivers/remoteproc/tee_remoteproc.c
>> @@ -0,0 +1,393 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + * Copyright (C) STMicroelectronics 2023 - All Rights Reserved
>> + * Author: Arnaud Pouliquen 
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include "remoteproc_internal.h"
>> +
>> +#define MAX_TEE_PARAM_ARRY_MEMBER   4
>> +
>> +/*
>> + * Authentication of the firmware and load in the remote processor memory
>> + *
>> + * [in]  params[0].value.a: unique 32bit identifier of the remote processor
>> + * [in]  params[1].memref:  buffer containing the image of the 
>> buffer
>> + */
>> +#define TA_RPROC_FW_CMD_LOAD_FW 1
>> +
>> +/*
>> + * Start the remote processor
>> + *
>> + * [in]  params[0].value.a: unique 32bit identifier of the remote processor
>> + */
>> +#define TA_RPROC_FW_CMD_START_FW2
>> +
>> +/*
>> + * Stop the remote processor
>> + *
>> + * [in]  params[0].value.a: unique 32bit identifier of the remote processor
>> + */
>> +#define TA_RPROC_FW_CMD_STOP_FW 3
>> +
>> +/*
>> + * Return the address of the resource table, or 0 if not found
>> + * No check is done to verify that the address returned is accessible by
>> + * the non secure context. If the resource table is loaded in a protected
>> + * memory the access by the non secure context will lead to a data abort.
>> + *
>> + * [in]  params[0].value.a: unique 32bit identifier of the remote processor
>> + * [out]  params[1].value.a:32bit LSB resource table memory address
>> + * [out]  params[1].value.b:32bit MSB resource table memory address
>> + * [out]  params[2].value.a:32bit LSB resource table memory size
>> + * [out]  params[2].value.b:32bit MSB resource table memory size
>> + */
>> +#define TA_RPROC_FW_CMD_GET_RSC_TABLE   4
>> +
>> +/*
>> + * Return the address of the core dump
>> + *
>> + * [in]  params[0].value.a: unique 32bit identifier of the remote processor
>> + * [out] params[1].memref:  address of the core dump image if exist,
>> + *  else return Null
>> + */
>> +#define TA_RPROC_FW_CMD_GET_COREDUMP5
>> +
>> +struct tee_rproc_mem {
>> +char name[20];
>> +void __iomem *cpu_addr;
>> +phys_addr_t bus_addr;
>> +u32 

Re: [PATCH v2 2/4] dt-bindings: remoteproc: Add compatibility for TEE support

2024-01-26 Thread Krzysztof Kozlowski
On 18/01/2024 11:04, Arnaud Pouliquen wrote:
> The "st,stm32mp1-m4-tee" compatible is utilized in a system configuration
> where the Cortex-M4 firmware is loaded by the Trusted execution Environment
> (TEE).
> For instance, this compatible is used in both the Linux and OP-TEE
> device-tree:
> - In OP-TEE, a node is defined in the device tree with the
>   st,stm32mp1-m4-tee to support signed remoteproc firmware.
>   Based on DT properties, OP-TEE authenticates, loads, starts, and stops
>   the firmware.
> - On Linux, when the compatibility is set, the Cortex-M resets should not
>   be declared in the device tree.
> 
> Signed-off-by: Arnaud Pouliquen 
> ---
> V1 to V2 updates
> - update "st,stm32mp1-m4" compatible description to generalize
> - remove the 'reset-names' requirement in one conditional branch, as the
>   property is already part of the condition test.
> ---
>  .../bindings/remoteproc/st,stm32-rproc.yaml   | 52 +++
>  1 file changed, 43 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml 
> b/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml
> index 370af61d8f28..6af821b15736 100644
> --- a/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml
> +++ b/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml
> @@ -16,7 +16,12 @@ maintainers:
>  
>  properties:
>compatible:
> -const: st,stm32mp1-m4
> +enum:
> +  - st,stm32mp1-m4
> +  - st,stm32mp1-m4-tee

The patch looks good to me, but I wonder about this choice of two
compatibles.

Basically this is the same hardware with the same interface, but two
compatibles to differentiate a bit different firmware setup. We have
already such cases for Qualcomm [1] [2] and new ones will be coming. [3]

I wonder whether this should be rather the same compatible with
additional property, e.g. "st,tee-control" or "remote-control".

[1]
https://elixir.bootlin.com/linux/v6.7.1/source/Documentation/devicetree/bindings/dma/qcom,bam-dma.yaml#L54

[2]
https://elixir.bootlin.com/linux/v6.7.1/source/Documentation/devicetree/bindings/net/qcom,ipa.yaml#L129
(that's a bit different)

[3] https://lore.kernel.org/linux-devicetree/20240124103623.GJ4906@thinkpad/

@Rob,
Any general guidance for this and Qualcomm?

Best regards,
Krzysztof




Re: [PATCH v2 0/2] Update mce_record tracepoint

2024-01-26 Thread Borislav Petkov
On Thu, Jan 25, 2024 at 07:19:22PM +, Luck, Tony wrote:
> 8 bytes for PPIN, 4 more for microcode.

I know, nothing leads to bloat like 0.01% here, 0.001% there...

> Number of recoverable machine checks per system  I hope the
> monthly rate should be countable on my fingers...

That's not the point. Rather, when you look at MCE reports, you pretty
much almost always go and collect additional information from the target
machine because you want to figure out what exactly is going on.

So what's stopping you from collecting all that static information
instead of parrotting it through the tracepoint with each error?

> PPIN is useful when talking to the CPU vendor about patterns of
> similar errors seen across a cluster.

I guess that is perhaps the only thing of the two that makes some sense
at least - the identifier uniquely describes which CPU the error comes
from...

> MICROCODE - gives a fast path to root cause problems that have already
> been fixed in a microcode update.

But that, nah. See above.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH RFC v3 19/35] arm64: mte: Discover tag storage memory

2024-01-26 Thread Krzysztof Kozlowski
On 25/01/2024 17:42, Alexandru Elisei wrote:
> Allow the kernel to get the base address, size, block size and associated
> memory node for tag storage from the device tree blob.
> 

Please use scripts/get_maintainers.pl to get a list of necessary people
and lists to CC. It might happen, that command when run on an older
kernel, gives you outdated entries. Therefore please be sure you base
your patches on recent Linux kernel.

Tools like b4 or scripts_getmaintainer.pl provide you proper list of
people, so fix your workflow. Tools might also fail if you work on some
ancient tree (don't, use mainline), work on fork of kernel (don't, use
mainline) or you ignore some maintainers (really don't). Just use b4 and
all the problems go away.

You missed at least devicetree list (maybe more), so this won't be
tested by automated tooling. Performing review on untested code might be
a waste of time, thus I will skip this patch entirely till you follow
the process allowing the patch to be tested.

Please kindly resend and include all necessary To/Cc entries.


> A tag storage region represents the smallest contiguous memory region that
> holds all the tags for the associated contiguous memory region which can be
> tagged. For example, for a 32GB contiguous tagged memory the corresponding
> tag storage region is exactly 1GB of contiguous memory, not two adjacent
> 512M of tag storage memory, nor one 2GB tag storage region.
> 
> Tag storage is described as reserved memory; future patches will teach the
> kernel how to make use of it for data (non-tagged) allocations.
> 
> Signed-off-by: Alexandru Elisei 
> ---
> 
> Changes since rfc v2:
> 
> * Reworked from rfc v2 patch #11 ("arm64: mte: Reserve tag storage memory").
> * Added device tree schema (Rob Herring)
> * Tag storage memory is now described in the "reserved-memory" node (Rob
> Herring).
> 
>  .../reserved-memory/arm,mte-tag-storage.yaml  |  78 +

Please run scripts/checkpatch.pl and fix reported warnings. Some
warnings can be ignored, but the code here looks like it needs a fix.
Feel free to get in touch if the warning is not clear.


Best regards,
Krzysztof




[PATCH v3 2/2] tracing: Include Microcode Revision in mce_record tracepoint

2024-01-26 Thread Avadhut Naik
Currently, the microcode field (Microcode Revision) of struct mce is not
exported to userspace through the mce_record tracepoint.

Export it through the tracepoint as it may provide useful information for
debug and analysis.

Signed-off-by: Avadhut Naik 
Reviewed-by: Sohil Mehta 
---
 include/trace/events/mce.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
index 41a5431545e2..2857f6801e73 100644
--- a/include/trace/events/mce.h
+++ b/include/trace/events/mce.h
@@ -34,6 +34,7 @@ TRACE_EVENT(mce_record,
__field(u8, cs  )
__field(u8, bank)
__field(u8, cpuvendor   )
+   __field(u32,microcode   )
),
 
TP_fast_assign(
@@ -55,9 +56,10 @@ TRACE_EVENT(mce_record,
__entry->cs = m->cs;
__entry->bank   = m->bank;
__entry->cpuvendor  = m->cpuvendor;
+   __entry->microcode  = m->microcode;
),
 
-   TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, 
ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: 
%llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x",
+   TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, 
ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: 
%llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x, MICROCODE REVISION: 
%x",
__entry->cpu,
__entry->mcgcap, __entry->mcgstatus,
__entry->bank, __entry->status,
@@ -69,7 +71,8 @@ TRACE_EVENT(mce_record,
__entry->cpuvendor, __entry->cpuid,
__entry->walltime,
__entry->socketid,
-   __entry->apicid)
+   __entry->apicid,
+   __entry->microcode)
 );
 
 #endif /* _TRACE_MCE_H */
-- 
2.34.1




[PATCH v3 1/2] tracing: Include PPIN in mce_record tracepoint

2024-01-26 Thread Avadhut Naik
Machine Check Error information from struct mce is exported to userspace
through the mce_record tracepoint.

Currently, however, the PPIN (Protected Processor Inventory Number) field
of struct mce is not exported through the tracepoint.

Export PPIN through the tracepoint as it may provide useful information
for debug and analysis.

Signed-off-by: Avadhut Naik 
Reviewed-by: Sohil Mehta 
---
 include/trace/events/mce.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
index 1391ada0da3b..41a5431545e2 100644
--- a/include/trace/events/mce.h
+++ b/include/trace/events/mce.h
@@ -25,6 +25,7 @@ TRACE_EVENT(mce_record,
__field(u64,ipid)
__field(u64,ip  )
__field(u64,tsc )
+   __field(u64,ppin)
__field(u64,walltime)
__field(u32,cpu )
__field(u32,cpuid   )
@@ -45,6 +46,7 @@ TRACE_EVENT(mce_record,
__entry->ipid   = m->ipid;
__entry->ip = m->ip;
__entry->tsc= m->tsc;
+   __entry->ppin   = m->ppin;
__entry->walltime   = m->time;
__entry->cpu= m->extcpu;
__entry->cpuid  = m->cpuid;
@@ -55,7 +57,7 @@ TRACE_EVENT(mce_record,
__entry->cpuvendor  = m->cpuvendor;
),
 
-   TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, 
ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PROCESSOR: 
%u:%x, TIME: %llu, SOCKET: %u, APIC: %x",
+   TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, 
ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: 
%llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x",
__entry->cpu,
__entry->mcgcap, __entry->mcgstatus,
__entry->bank, __entry->status,
@@ -63,6 +65,7 @@ TRACE_EVENT(mce_record,
__entry->addr, __entry->misc, __entry->synd,
__entry->cs, __entry->ip,
__entry->tsc,
+   __entry->ppin,
__entry->cpuvendor, __entry->cpuid,
__entry->walltime,
__entry->socketid,
-- 
2.34.1




[PATCH v3 0/2] Update mce_record tracepoint

2024-01-26 Thread Avadhut Naik
This patchset updates the mce_record tracepoint so that the recently added
fields of struct mce are exported through it to userspace.

The first patch adds PPIN (Protected Processor Inventory Number) field to
the tracepoint.

The second patch adds the microcode field (Microcode Revision) to the
tracepoint.

Changes in v2:
 - Export microcode field (Microcode Revision) through the tracepoiont in
   addition to PPIN.

Changes in v3:
 - Change format specifier for microcode revision from %u to %x
 - Fix tab alignments
 - Add Reviewed-by: Sohil Mehta 

Avadhut Naik (2):
  tracing: Include PPIN in mce_record tracepoint
  tracing: Include Microcode Revision in mce_record tracepoint

 include/trace/events/mce.h | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)


base-commit: 0d4f19418f067465b0a84a287d9a51e443a0bc3a
-- 
2.34.1




Re: [PATCH] arm64: dts: qcom: sm6350: Add tsens thermal zones

2024-01-26 Thread Luca Weiss
On Thu Jan 25, 2024 at 5:30 PM CET, Konrad Dybcio wrote:
>
>
> On 1/24/24 16:31, Luca Weiss wrote:
> > Add the definitions for the various thermal zones found on the SM6350
> > SoC. Hooking up GPU and CPU cooling can limit the clock speeds there to
> > reduce the temperature again to good levels.
> > 
> > Most thermal zones only have one critical temperature configured at
> > 125°C which can be mostly considered a placeholder until those zones can
> > be hooked up to cooling.
> > 
> > Signed-off-by: Luca Weiss 
> > ---
>
> [...]
>
> > +   cpuss0-thermal {
> > +   polling-delay-passive = <0>;
> > +   polling-delay = <0>;
> > +
> > +   thermal-sensors = < 7>;
>
> cpuss0-thermal and cpuss1-thermal are very likely the sensors for
> cluster0/1, can you test that out, perhaps with corepinning+stress?

Not really, according to my docs the CPUs aren't placed symmetrically on
the SoC and cpuss0 and cpuss1 are just somewhere inbetween all of the 6x
LITTLE cores, the 2x big cores are a little bit further away. So apart
from hooking up all of the cores for cooling for cpuss0 & cpuss1 I don't
have a great idea what to do here. Shall I do that?

Regards
Luca


> You can then assign multiple cpu cooling devices.
>
> LGTM otherwise!
>
>
> Konrad