Re: [PATCH v2 1/4] tracing/user_events: Prepare find/delete for same name events

2024-02-13 Thread kernel test robot



Hello,

kernel test robot noticed "BUG:KASAN:slab-use-after-free_in_user_events_ioctl" 
on:

commit: fecc001d587ceeeb47043c20353f257e3f01b39f ("[PATCH v2 1/4] 
tracing/user_events: Prepare find/delete for same name events")
url: 
https://github.com/intel-lab-lkp/linux/commits/Beau-Belgrave/tracing-user_events-Prepare-find-delete-for-same-name-events/20240203-031207
patch link: 
https://lore.kernel.org/all/20240202184449.1674-2-be...@linux.microsoft.com/
patch subject: [PATCH v2 1/4] tracing/user_events: Prepare find/delete for same 
name events

in testcase: kernel-selftests
version: kernel-selftests-x86_64-60acb023-1_20230329
with following parameters:

group: user_events



compiler: gcc-12
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz 
(Cascade Lake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-lkp/202402141240.cc5aba78-oliver.s...@intel.com


[ 106.969333][ T2278] BUG: KASAN: slab-use-after-free in user_events_ioctl 
(kernel/trace/trace_events_user.c:2067 kernel/trace/trace_events_user.c:2401 
kernel/trace/trace_events_user.c:2543) 
[  106.970079][ T2278] Read of size 8 at addr 88816644ef38 by task 
abi_test/2278
[  106.970788][ T2278]
[  106.971058][ T2278] CPU: 2 PID: 2278 Comm: abi_test Not tainted 
6.7.0-rc8-1-gfecc001d587c #1
[  106.971881][ T2278] Hardware name: Gigabyte Technology Co., Ltd. X299 UD4 
Pro/X299 UD4 Pro-CF, BIOS F8a 04/27/2021
[  106.972829][ T2278] Call Trace:
[  106.973185][ T2278]  
[ 106.973514][ T2278] dump_stack_lvl (lib/dump_stack.c:108) 
[ 106.973966][ T2278] print_address_description+0x2c/0x3a0 
[ 106.974597][ T2278] ? user_events_ioctl 
(kernel/trace/trace_events_user.c:2067 kernel/trace/trace_events_user.c:2401 
kernel/trace/trace_events_user.c:2543) 
[ 106.975099][ T2278] print_report (mm/kasan/report.c:476) 
[ 106.975542][ T2278] ? kasan_addr_to_slab (mm/kasan/common.c:35) 
[ 106.976025][ T2278] ? user_events_ioctl 
(kernel/trace/trace_events_user.c:2067 kernel/trace/trace_events_user.c:2401 
kernel/trace/trace_events_user.c:2543) 
[ 106.976531][ T2278] kasan_report (mm/kasan/report.c:590) 
[ 106.976978][ T2278] ? user_events_ioctl 
(kernel/trace/trace_events_user.c:2067 kernel/trace/trace_events_user.c:2401 
kernel/trace/trace_events_user.c:2543) 
[ 106.977481][ T2278] user_events_ioctl (kernel/trace/trace_events_user.c:2067 
kernel/trace/trace_events_user.c:2401 kernel/trace/trace_events_user.c:2543) 
[ 106.977970][ T2278] __x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:871 
fs/ioctl.c:857 fs/ioctl.c:857) 
[ 106.978441][ T2278] do_syscall_64 (arch/x86/entry/common.c:52 
arch/x86/entry/common.c:83) 
[ 106.978889][ T2278] entry_SYSCALL_64_after_hwframe 
(arch/x86/entry/entry_64.S:129) 
[  106.979462][ T2278] RIP: 0033:0x7f2e121c8b5b
[ 106.979907][ T2278] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 
00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> 
c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
All code

   0:   00 48 89add%cl,-0x77(%rax)
   3:   44 24 18rex.R and $0x18,%al
   6:   31 c0   xor%eax,%eax
   8:   48 8d 44 24 60  lea0x60(%rsp),%rax
   d:   c7 04 24 10 00 00 00movl   $0x10,(%rsp)
  14:   48 89 44 24 08  mov%rax,0x8(%rsp)
  19:   48 8d 44 24 20  lea0x20(%rsp),%rax
  1e:   48 89 44 24 10  mov%rax,0x10(%rsp)
  23:   b8 10 00 00 00  mov$0x10,%eax
  28:   0f 05   syscall
  2a:*  89 c2   mov%eax,%edx<-- trapping 
instruction
  2c:   3d 00 f0 ff ff  cmp$0xf000,%eax
  31:   77 1c   ja 0x4f
  33:   48 8b 44 24 18  mov0x18(%rsp),%rax
  38:   64  fs
  39:   48  rex.W
  3a:   2b  .byte 0x2b
  3b:   04 25   add$0x25,%al
  3d:   28 00   sub%al,(%rax)
...

Code starting with the faulting instruction
===
   0:   89 c2   mov%eax,%edx
   2:   3d 00 f0 ff ff  cmp$0xf000,%eax
   7:   77 1c   ja 0x25
   9:   48 8b 44 24 18  mov0x18(%rsp),%rax
   e:   64  fs
   f:   48  rex.W
  10:   2b  .byte 0x2b
  11:   04 25   add$0x25,%al
  13:   28 00   sub%al,(%rax)
...
[  106.981608][ T2278] RSP: 002b:7ffcb0ba5ed0 EFLAGS: 0246 ORIG_RAX: 
0010
[  106.982385][ T2278] RAX: ffda RBX: 7ffcb0ba6228 RCX: 
7f2e121c8b5b
[  106.983128][ T2278] RDX: 564d434bc6fe RSI: 4

Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-13 Thread Jarkko Sakkinen
On Tue Feb 13, 2024 at 1:15 AM EET, Haitao Huang wrote:
> Hi Jarkko
>
> On Mon, 12 Feb 2024 13:55:46 -0600, Jarkko Sakkinen   
> wrote:
>
> > On Mon Feb 5, 2024 at 11:06 PM EET, Haitao Huang wrote:
> >> From: Kristen Carlson Accardi 
> >>
> >> When the EPC usage of a cgroup is near its limit, the cgroup needs to
> >> reclaim pages used in the same cgroup to make room for new allocations.
> >> This is analogous to the behavior that the global reclaimer is triggered
> >> when the global usage is close to total available EPC.
> >>
> >> Add a Boolean parameter for sgx_epc_cgroup_try_charge() to indicate
> >> whether synchronous reclaim is allowed or not. And trigger the
> >> synchronous/asynchronous reclamation flow accordingly.
> >>
> >> Note at this point, all reclaimable EPC pages are still tracked in the
> >> global LRU and per-cgroup LRUs are empty. So no per-cgroup reclamation
> >> is activated yet.
> >>
> >> Co-developed-by: Sean Christopherson 
> >> Signed-off-by: Sean Christopherson 
> >> Signed-off-by: Kristen Carlson Accardi 
> >> Co-developed-by: Haitao Huang 
> >> Signed-off-by: Haitao Huang 
> >> ---
> >> V7:
> >> - Split this out from the big patch, #10 in V6. (Dave, Kai)
> >> ---
> >>  arch/x86/kernel/cpu/sgx/epc_cgroup.c | 26 --
> >>  arch/x86/kernel/cpu/sgx/epc_cgroup.h |  4 ++--
> >>  arch/x86/kernel/cpu/sgx/main.c   |  2 +-
> >>  3 files changed, 27 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c  
> >> b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> >> index d399fda2b55e..abf74fdb12b4 100644
> >> --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> >> +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> >> @@ -184,13 +184,35 @@ static void  
> >> sgx_epc_cgroup_reclaim_work_func(struct work_struct *work)
> >>  /**
> >>   * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single EPC  
> >> page
> >>   * @epc_cg:   The EPC cgroup to be charged for the page.
> >> + * @reclaim:  Whether or not synchronous reclaim is allowed
> >>   * Return:
> >>   * * %0 - If successfully charged.
> >>   * * -errno - for failures.
> >>   */
> >> -int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
> >> +int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool  
> >> reclaim)
> >>  {
> >> -  return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE);
> >> +  for (;;) {
> >> +  if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,
> >> +  PAGE_SIZE))
> >> +  break;
> >> +
> >> +  if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
> >> +  return -ENOMEM;
> >> + +if (signal_pending(current))
> >> +  return -ERESTARTSYS;
> >> +
> >> +  if (!reclaim) {
> >> +  queue_work(sgx_epc_cg_wq, &epc_cg->reclaim_work);
> >> +  return -EBUSY;
> >> +  }
> >> +
> >> +  if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false))
> >> +  /* All pages were too young to reclaim, try again a 
> >> little later */
> >> +  schedule();
> >
> > This will be total pain to backtrack after a while when something
> > needs to be changed so there definitely should be inline comments
> > addressing each branch condition.
> >
> > I'd rethink this as:
> >
> > 1. Create static __sgx_epc_cgroup_try_charge() for addressing single
> >iteration with the new "reclaim" parameter.
> > 2. Add a new sgx_epc_group_try_charge_reclaim() function.
> >
> > There's a bit of redundancy with sgx_epc_cgroup_try_charge() and
> > sgx_epc_cgroup_try_charge_reclaim() because both have almost the
> > same loop calling internal __sgx_epc_cgroup_try_charge() with
> > different parameters. That is totally acceptable.
> >
> > Please also add my suggested-by.
> >
> > BR, Jarkko
> >
> > BR, Jarkko
> >
> For #2:
> The only caller of this function, sgx_alloc_epc_page(), has the same  
> boolean which is passed into this this function.

I know. This would be good opportunity to fix that up. Large patch
sets should try to make the space for its feature best possible and
thus also clean up the code base overally.

> If we separate it into sgx_epc_cgroup_try_charge() and  
> sgx_epc_cgroup_try_charge_reclaim(), then the caller has to have the  
> if/else branches. So separation here seems not help?

Of course it does. It makes the code in that location self-documenting
and easier to remember what it does.

BR, Jarkko



Re: [PATCH v7 10/36] ftrace/function_graph: Pass fgraph_ops to function graph callbacks

2024-02-13 Thread Steven Rostedt
On Wed,  7 Feb 2024 00:09:21 +0900
"Masami Hiramatsu (Google)"  wrote:

> From: Steven Rostedt (VMware) 
> 
> Pass the fgraph_ops structure to the function graph callbacks. This will
> allow callbacks to add a descriptor to a fgraph_ops private field that wil
> be added in the future and use it for the callbacks. This will be useful
> when more than one callback can be registered to the function graph tracer.
> 
> Signed-off-by: Steven Rostedt (VMware) 
> Signed-off-by: Masami Hiramatsu (Google) 
> ---
>  Changes in v2:
>   - cleanup to set argument name on function prototype.
> ---
>

This patch fails to compile without this change:

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index e35a941a5af3..47b461b1cf7e 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -381,7 +381,7 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
if (gops == &fgraph_stub)
continue;
 
-   if (gops->entryfunc(&trace))
+   if (gops->entryfunc(&trace, gops))
bitmap |= BIT(i);
}
 


-- Steve



Re: [PATCH v17 2/6] ring-buffer: Introducing ring-buffer mapping functions

2024-02-13 Thread kernel test robot
Hi Vincent,

kernel test robot noticed the following build errors:

[auto build test ERROR on ca185770db914869ff9fe773bac5e0e5e4165b83]

url:
https://github.com/intel-lab-lkp/linux/commits/Vincent-Donnefort/ring-buffer-Zero-ring-buffer-sub-buffers/20240213-195302
base:   ca185770db914869ff9fe773bac5e0e5e4165b83
patch link:
https://lore.kernel.org/r/20240213114945.3528801-3-vdonnefort%40google.com
patch subject: [PATCH v17 2/6] ring-buffer: Introducing ring-buffer mapping 
functions
config: i386-buildonly-randconfig-001-20240214 
(https://download.01.org/0day-ci/archive/20240214/202402140910.tfs9k0yr-...@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 
6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240214/202402140910.tfs9k0yr-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202402140910.tfs9k0yr-...@intel.com/

All errors (new ones prefixed by >>):

>> kernel/trace/ring_buffer.c:6185:2: error: member reference type 'struct 
>> mutex' is not a pointer; did you mean to use '.'?
6185 | lockdep_assert_held(cpu_buffer->mapping_lock);
 | ^
   include/linux/lockdep.h:267:17: note: expanded from macro 
'lockdep_assert_held'
 267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
 | ~~~^~
   include/linux/lockdep.h:234:52: note: expanded from macro 'lockdep_is_held'
 234 | #define lockdep_is_held(lock)   
lock_is_held(&(lock)->dep_map)
 | ^
   include/linux/lockdep.h:261:32: note: expanded from macro 'lockdep_assert'
 261 | do { WARN_ON(debug_locks && !(cond)); } while (0)
 |  ~^~
   include/asm-generic/bug.h:123:25: note: expanded from macro 'WARN_ON'
 123 | int __ret_warn_on = !!(condition);   
   \
 |^
>> kernel/trace/ring_buffer.c:6185:2: error: cannot take the address of an 
>> rvalue of type 'struct lockdep_map'
6185 | lockdep_assert_held(cpu_buffer->mapping_lock);
 | ^
   include/linux/lockdep.h:267:17: note: expanded from macro 
'lockdep_assert_held'
 267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
 | ~~~^~
   include/linux/lockdep.h:234:45: note: expanded from macro 'lockdep_is_held'
 234 | #define lockdep_is_held(lock)   
lock_is_held(&(lock)->dep_map)
 |  ^
   include/linux/lockdep.h:261:32: note: expanded from macro 'lockdep_assert'
 261 | do { WARN_ON(debug_locks && !(cond)); } while (0)
 |  ~^~
   include/asm-generic/bug.h:123:25: note: expanded from macro 'WARN_ON'
 123 | int __ret_warn_on = !!(condition);   
   \
 |^
   2 errors generated.


vim +6185 kernel/trace/ring_buffer.c

  6174  
  6175  /*
  6176   * Fast-path for rb_buffer_(un)map(). Called whenever the meta-page 
doesn't need
  6177   * to be set-up or torn-down.
  6178   */
  6179  static int __rb_inc_dec_mapped(struct trace_buffer *buffer,
  6180 struct ring_buffer_per_cpu *cpu_buffer,
  6181 bool inc)
  6182  {
  6183  unsigned long flags;
  6184  
> 6185  lockdep_assert_held(cpu_buffer->mapping_lock);
  6186  
  6187  if (inc && cpu_buffer->mapped == UINT_MAX)
  6188  return -EBUSY;
  6189  
  6190  if (WARN_ON(!inc && cpu_buffer->mapped == 0))
  6191  return -EINVAL;
  6192  
  6193  mutex_lock(&buffer->mutex);
  6194  raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
  6195  
  6196  if (inc)
  6197  cpu_buffer->mapped++;
  6198  else
  6199  cpu_buffer->mapped--;
  6200  
  6201  raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
  6202  mutex_unlock(&buffer->mutex);
  6203  
  6204  return 0;
  6205  }
  6206  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



Re: [PATCH v2] tracing: Have saved_cmdlines arrays all in one allocation

2024-02-13 Thread Tim Chen
On Tue, 2024-02-13 at 11:52 -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" 
> 
> The saved_cmdlines have three arrays for mapping PIDs to COMMs:
> 
>  - map_pid_to_cmdline[]
>  - map_cmdline_to_pid[]
>  - saved_cmdlines
> 
> The map_pid_to_cmdline[] is PID_MAX_DEFAULT in size and holds the index
> into the other arrays. The map_cmdline_to_pid[] is a mapping back to the
> full pid as it can be larger than PID_MAX_DEFAULT. And the
> saved_cmdlines[] just holds the COMMs associated to the pids.
> 
> Currently the map_pid_to_cmdline[] and saved_cmdlines[] are allocated
> together (in reality the saved_cmdlines is just in the memory of the
> rounding of the allocation of the structure as it is always allocated in
> powers of two). The map_cmdline_to_pid[] array is allocated separately.
> 
> Since the rounding to a power of two is rather large (it allows for 8000
> elements in saved_cmdlines), also include the map_cmdline_to_pid[] array.
> (This drops it to 6000 by default, which is still plenty for most use
> cases). This saves even more memory as the map_cmdline_to_pid[] array
> doesn't need to be allocated.
> 
> Link: 
> https://lore.kernel.org/linux-trace-kernel/20240212174011.06821...@gandalf.local.home/
> 
> Signed-off-by: Steven Rostedt (Google) 


Patch looks good to me.

Reviewd-by: Tim Chen 

> ---
> Changes since v1: 
> https://lore.kernel.org/linux-trace-kernel/20240212180941.379c4...@gandalf.local.home/
> 
> -- Added SAVED_CMDLINE_MAP_ELEMENT_SIZE helper macro.
> 




Re: [PATCH v2] tracing: Have saved_cmdlines arrays all in one allocation

2024-02-13 Thread Google
On Tue, 13 Feb 2024 11:52:32 -0500
Steven Rostedt  wrote:

> From: "Steven Rostedt (Google)" 
> 
> The saved_cmdlines have three arrays for mapping PIDs to COMMs:
> 
>  - map_pid_to_cmdline[]
>  - map_cmdline_to_pid[]
>  - saved_cmdlines
> 
> The map_pid_to_cmdline[] is PID_MAX_DEFAULT in size and holds the index
> into the other arrays. The map_cmdline_to_pid[] is a mapping back to the
> full pid as it can be larger than PID_MAX_DEFAULT. And the
> saved_cmdlines[] just holds the COMMs associated to the pids.
> 
> Currently the map_pid_to_cmdline[] and saved_cmdlines[] are allocated
> together (in reality the saved_cmdlines is just in the memory of the
> rounding of the allocation of the structure as it is always allocated in
> powers of two). The map_cmdline_to_pid[] array is allocated separately.
> 
> Since the rounding to a power of two is rather large (it allows for 8000
> elements in saved_cmdlines), also include the map_cmdline_to_pid[] array.
> (This drops it to 6000 by default, which is still plenty for most use
> cases). This saves even more memory as the map_cmdline_to_pid[] array
> doesn't need to be allocated.
> 
> Link: 
> https://lore.kernel.org/linux-trace-kernel/20240212174011.06821...@gandalf.local.home/
> 

OK, this looks good to me.

Acked-by: Masami Hiramatsu (Google) 

Thank you,

> Signed-off-by: Steven Rostedt (Google) 
> ---
> Changes since v1: 
> https://lore.kernel.org/linux-trace-kernel/20240212180941.379c4...@gandalf.local.home/
> 
> -- Added SAVED_CMDLINE_MAP_ELEMENT_SIZE helper macro.
> 
>  kernel/trace/trace_sched_switch.c | 17 -
>  1 file changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/trace/trace_sched_switch.c 
> b/kernel/trace/trace_sched_switch.c
> index e4fbcc3bede5..5f3e9bc87079 100644
> --- a/kernel/trace/trace_sched_switch.c
> +++ b/kernel/trace/trace_sched_switch.c
> @@ -175,6 +175,10 @@ struct saved_cmdlines_buffer {
>  };
>  static struct saved_cmdlines_buffer *savedcmd;
>  
> +/* Holds the size of a cmdline and pid element */
> +#define SAVED_CMDLINE_MAP_ELEMENT_SIZE(s)\
> + (TASK_COMM_LEN + sizeof((s)->map_cmdline_to_pid[0]))
> +
>  static inline char *get_saved_cmdlines(int idx)
>  {
>   return &savedcmd->saved_cmdlines[idx * TASK_COMM_LEN];
> @@ -201,7 +205,7 @@ static struct saved_cmdlines_buffer 
> *allocate_cmdlines_buffer(unsigned int val)
>   int order;
>  
>   /* Figure out how much is needed to hold the given number of cmdlines */
> - orig_size = sizeof(*s) + val * TASK_COMM_LEN;
> + orig_size = sizeof(*s) + val * SAVED_CMDLINE_MAP_ELEMENT_SIZE(s);
>   order = get_order(orig_size);
>   size = 1 << (order + PAGE_SHIFT);
>   page = alloc_pages(GFP_KERNEL, order);
> @@ -212,16 +216,11 @@ static struct saved_cmdlines_buffer 
> *allocate_cmdlines_buffer(unsigned int val)
>   memset(s, 0, sizeof(*s));
>  
>   /* Round up to actual allocation */
> - val = (size - sizeof(*s)) / TASK_COMM_LEN;
> + val = (size - sizeof(*s)) / SAVED_CMDLINE_MAP_ELEMENT_SIZE(s);
>   s->cmdline_num = val;
>  
> - s->map_cmdline_to_pid = kmalloc_array(val,
> -   sizeof(*s->map_cmdline_to_pid),
> -   GFP_KERNEL);
> - if (!s->map_cmdline_to_pid) {
> - free_saved_cmdlines_buffer(s);
> - return NULL;
> - }
> + /* Place map_cmdline_to_pid array right after saved_cmdlines */
> + s->map_cmdline_to_pid = (unsigned *)&s->saved_cmdlines[val * 
> TASK_COMM_LEN];
>  
>   s->cmdline_idx = 0;
>   memset(&s->map_pid_to_cmdline, NO_CMDLINE_MAP,
> -- 
> 2.43.0
> 
> 


-- 
Masami Hiramatsu (Google) 



Re: [PATCH v17 2/6] ring-buffer: Introducing ring-buffer mapping functions

2024-02-13 Thread Steven Rostedt
On Tue, 13 Feb 2024 15:53:09 -0500
Steven Rostedt  wrote:

> On Tue, 13 Feb 2024 11:49:41 +
> Vincent Donnefort  wrote:
> 
> Did you test with lockdep?
> 
> > +static int __rb_inc_dec_mapped(struct trace_buffer *buffer,
> > +  struct ring_buffer_per_cpu *cpu_buffer,
> > +  bool inc)
> > +{
> > +   unsigned long flags;
> > +
> > +   lockdep_assert_held(cpu_buffer->mapping_lock);  
> 
> /work/git/linux-trace.git/kernel/trace/ring_buffer.c: In function 
> ‘__rb_inc_dec_mapped’:
> /work/git/linux-trace.git/include/linux/lockdep.h:234:61: error: invalid type 
> argument of ‘->’ (have ‘struct mutex’)
>   234 | #define lockdep_is_held(lock)   lock_is_held(&(lock)->dep_map)
>   | ^~
> /work/git/linux-trace.git/include/asm-generic/bug.h:123:32: note: in 
> definition of macro ‘WARN_ON’
>   123 | int __ret_warn_on = !!(condition);
>   \
>   |^
> /work/git/linux-trace.git/include/linux/lockdep.h:267:9: note: in expansion 
> of macro ‘lockdep_assert’
>   267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
>   | ^~
> /work/git/linux-trace.git/include/linux/lockdep.h:267:24: note: in expansion 
> of macro ‘lockdep_is_held’
>   267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
>   |^~~
> /work/git/linux-trace.git/kernel/trace/ring_buffer.c:6167:9: note: in 
> expansion of macro ‘lockdep_assert_held’
>  6167 | lockdep_assert_held(cpu_buffer->mapping_lock);
>   | ^~~
> 
> I believe that is supposed to be:
> 
>   lockdep_assert_held(&cpu_buffer->mapping_lock);

If this is the only issue with this series, I may just fix up the patch
myself.



Re: [PATCH v10 2/4] dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings

2024-02-13 Thread Rob Herring
On Tue, Feb 13, 2024 at 02:37:49PM -0600, Tanmay Shah wrote:
> Hello,
> 
> Thanks for reviews please find my comments below.
> 
> On 2/13/24 1:20 PM, Rob Herring wrote:
> > On Tue, 13 Feb 2024 09:54:48 -0800, Tanmay Shah wrote:
> > > From: Radhey Shyam Pandey 
> > > 
> > > Introduce bindings for TCM memory address space on AMD-xilinx Zynq
> > > UltraScale+ platform. It will help in defining TCM in device-tree
> > > and make it's access platform agnostic and data-driven.
> > > 
> > > Tightly-coupled memories(TCMs) are low-latency memory that provides
> > > predictable instruction execution and predictable data load/store
> > > timing. Each Cortex-R5F processor contains two 64-bit wide 64 KB memory
> > > banks on the ATCM and BTCM ports, for a total of 128 KB of memory.
> > > 
> > > The TCM resources(reg, reg-names and power-domain) are documented for
> > > each TCM in the R5 node. The reg and reg-names are made as required
> > > properties as we don't want to hardcode TCM addresses for future
> > > platforms and for zu+ legacy implementation will ensure that the
> > > old dts w/o reg/reg-names works and stable ABI is maintained.
> > > 
> > > It also extends the examples for TCM split and lockstep modes.
> > > 
> > > Signed-off-by: Radhey Shyam Pandey 
> > > Signed-off-by: Tanmay Shah 
> > > ---
> > > 
> > > Changes in v10:
> > >   - modify number of "reg", "reg-names" and "power-domains" entries
> > > based on cluster mode
> > >   - Add extra optional atcm and btcm in "reg" property for lockstep mode
> > >   - Add "reg-names" for extra optional atcm and btcm for lockstep mode
> > >   - Drop previous Ack as bindings has new change
> > > 
> > > Changes in v9:
> > >   - None
> > > Changes in v8:
> > >   - None
> > > Changes in v7:
> > >   - None
> > > Changes in v6:
> > >   - None
> > > Changes in v5:
> > >   - None
> > > 
> > > Changes in v4:
> > >   - Use address-cells and size-cells value 2
> > >   - Modify ranges property as per new value of address-cells
> > > and size-cells
> > >   - Modify child node "reg" property accordingly
> > >   - Remove previous ack for further review
> > > 
> > > v4 link: 
> > > https://lore.kernel.org/all/20230829181900.2561194-2-tanmay.s...@amd.com/
> > > 
> > >  .../remoteproc/xlnx,zynqmp-r5fss.yaml | 192 --
> > >  1 file changed, 170 insertions(+), 22 deletions(-)
> > > 
> >
> > My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
> > on your patch (DT_CHECKER_FLAGS is new in v5.13):
> >
> > yamllint warnings/errors:
> > ./Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml:118:13:
> >  [warning] wrong indentation: expected 10 but found 12 (indentation)
> Ack. I will fix this.
> 
> However, can I still get reviews on patch itself so if something else needs 
> to be fixed I can fix in next revision as well.
> 
> Also, I tried to run yamllint with following command:
> 
> make DT_CHECKER_FLAGS=-m dt_binding_check 
> DT_SCHEMA_FILES=Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
>  O=../build/zynqmp/linux-next/
> 
> 
> However, I see following logs without any error on bindings:
> 
>   LINT    Documentation/devicetree/bindings
> invalid config: unknown option "required" for rule "quoted-strings"
> *xargs: /usr/bin/yamllint: exited with status 255; aborting*
>   CHKDT   Documentation/devicetree/bindings/processed-schema.json
>   SCHEMA  Documentation/devicetree/bindings/processed-schema.json
>   DTEX    
> Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.example.dts
>   DTC_CHK 
> Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.example.dtb
> 
> I am not sure if my system is missing something but, yamllint tool is failing.

"unknown option" means old version of yamllint.

Rob



Re: [PATCH v1] module.h: define __symbol_get_gpl() as a regular __symbol_get()

2024-02-13 Thread Andrew Kanner
On Thu, Feb 01, 2024 at 10:13:54AM -0800, Luis Chamberlain wrote:
> 
> While you're at it, if you want to try it, you could see if you can
> improve the situation more by looking at symbol_get() users that remain
> and seeing if you can instead fix it with proper Kconfig dependency and
> at build time. Then we can just remove it as well.
> 
>   Luis

Sorry for the late reply.

Luis, can you give more details of your idea? I re-read it once, then
came back and still don't understand.

I see that there are ~10 users for symbol_get() currently. Do you want
to stringify symbol names at build time to completely remove
symbol_get() from module.h? Correct me if I'm wrong since using of a
fuction which is not declared anywhere sounds confusing.

-- 
Andrew Kanner



Re: CPU data cache across reboot/kexec for pmem/dax devices

2024-02-13 Thread Mathieu Desnoyers

On 2024-02-09 15:15, Dan Williams wrote:

Mathieu Desnoyers wrote:

Hi Dan,

In the context of extracting user-space trace data when the kernel crashes,
the LTTng user-space tracer recommends using nvdimm/pmem to reserve an area
of physical (volatile) RAM at boot (memmap=nn[KMG]!ss[KMG]), and use the
resulting device to create/mount a dax-enabled fs (e.g. ext4).

We then use this filesystem to mmap() the shared memory files for the tracer.

I want to make sure that the very last events from the userspace tracer written
to the memory mapped buffers (mmap()) by userspace are present after a
warm-reboot (or kexec/kdump).

Note that the LTTng user-space tracer (LTTng-UST) does *not* issue any clflush
(or equivalent pmem_persist() from libpmem) for performance reasons: ring buffer
data is usually overwritten many times before the system actually crashes, and
the only thing we really need to make sure is that the cache lines are not
invalidated without write back.

So I understand that the main use-case for pmem is nvdimm, and that in order to
guarantee persistence of the data on power off an explicit pmem_persist() is
needed after each "transaction", but for the needs of tracing, is there some
kind of architectural guarantee that the data present in the cpu data cache
is not invalidated prior to write back in each of those scenarios ?

- reboot with bios explicitly not clearing memory,


This one gives me pause, because a trip through the BIOS typically means
lots of resets and other low level magic, so this would likely require
pushing dirty data out of CPU caches prior to entering the BIOS code
paths.

So this either needs explicit cache flushing or mapping the memory with
write-through semantics. That latter one is not supported in the stack
today.


I would favor the explicit cache flushing approach over write-through semantics
for performance reasons: typically the ring buffers are overwritten, so always
storing the data to memory would be wasteful.

But I would rather do the cache flushing only on crash (die oops/panic 
notifiers)
rather than require the tracer to flush cache lines immediately after each event
is produced, again for performance reasons.

I have done a crude attempt at registering die oops/panic notifiers from the 
pmem
driver, and flush all pmem areas to memory when die oops/panic notifiers are
called (see the untested patch below). I wonder if this is something that would 
be
generally useful ?




- kexec/kdump.


This should maintain the state of CPU caches. As far as the CPU is
concerned it is just long jumping into a new kernel in memory without
resetting any CPU cache state.


Good to know that this scenario is expected to "Just Work".

Thanks,

Mathieu


diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index e9898457a7bd..b92f18607d39 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -26,12 +26,18 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "pmem.h"
 #include "btt.h"
 #include "pfn.h"
 #include "nd.h"
 
+static int pmem_die_handler(struct notifier_block *self,

+   unsigned long ev, void *unused);
+static int pmem_panic_handler(struct notifier_block *self,
+   unsigned long ev, void *unused);
+
 static struct device *to_dev(struct pmem_device *pmem)
 {
/*
@@ -423,6 +429,9 @@ static void pmem_release_disk(void *__pmem)
 {
struct pmem_device *pmem = __pmem;
 
+	atomic_notifier_chain_unregister(&panic_notifier_list,

+   &pmem->panic_notifier);
+   unregister_die_notifier(&pmem->die_notifier);
dax_remove_host(pmem->disk);
kill_dax(pmem->dax_dev);
put_dax(pmem->dax_dev);
@@ -573,9 +582,20 @@ static int pmem_attach_disk(struct device *dev,
goto out_cleanup_dax;
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
}
-   rc = device_add_disk(dev, disk, pmem_attribute_groups);
+   pmem->die_notifier.notifier_call = pmem_die_handler;
+   pmem->die_notifier.priority = -INT_MAX;
+   rc = register_die_notifier(&pmem->die_notifier);
if (rc)
goto out_remove_host;
+   pmem->panic_notifier.notifier_call = pmem_panic_handler;
+   pmem->panic_notifier.priority = -INT_MAX;
+   rc = atomic_notifier_chain_register(&panic_notifier_list,
+   &pmem->panic_notifier);
+   if (rc)
+   goto out_unregister_die;
+   rc = device_add_disk(dev, disk, pmem_attribute_groups);
+   if (rc)
+   goto out_unregister_panic;
if (devm_add_action_or_reset(dev, pmem_release_disk, pmem))
return -ENOMEM;
 
@@ -587,6 +607,11 @@ static int pmem_attach_disk(struct device *dev,

dev_warn(dev, "'badblocks' notification disabled\n");
return 0;
 
+out_unregister_panic:

+   atomic_notifier_chain_unregister(&panic_notifier_list,
+   &pmem->panic_notifier);
+out_unregister_d

Re: [PATCH v17 2/6] ring-buffer: Introducing ring-buffer mapping functions

2024-02-13 Thread Steven Rostedt
On Tue, 13 Feb 2024 11:49:41 +
Vincent Donnefort  wrote:

Did you test with lockdep?

> +static int __rb_inc_dec_mapped(struct trace_buffer *buffer,
> +struct ring_buffer_per_cpu *cpu_buffer,
> +bool inc)
> +{
> + unsigned long flags;
> +
> + lockdep_assert_held(cpu_buffer->mapping_lock);

/work/git/linux-trace.git/kernel/trace/ring_buffer.c: In function 
‘__rb_inc_dec_mapped’:
/work/git/linux-trace.git/include/linux/lockdep.h:234:61: error: invalid type 
argument of ‘->’ (have ‘struct mutex’)
  234 | #define lockdep_is_held(lock)   lock_is_held(&(lock)->dep_map)
  | ^~
/work/git/linux-trace.git/include/asm-generic/bug.h:123:32: note: in definition 
of macro ‘WARN_ON’
  123 | int __ret_warn_on = !!(condition);  
\
  |^
/work/git/linux-trace.git/include/linux/lockdep.h:267:9: note: in expansion of 
macro ‘lockdep_assert’
  267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
  | ^~
/work/git/linux-trace.git/include/linux/lockdep.h:267:24: note: in expansion of 
macro ‘lockdep_is_held’
  267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
  |^~~
/work/git/linux-trace.git/kernel/trace/ring_buffer.c:6167:9: note: in expansion 
of macro ‘lockdep_assert_held’
 6167 | lockdep_assert_held(cpu_buffer->mapping_lock);
  | ^~~

I believe that is supposed to be:

lockdep_assert_held(&cpu_buffer->mapping_lock);

-- Steve

> +
> + if (inc && cpu_buffer->mapped == UINT_MAX)
> + return -EBUSY;
> +
> + if (WARN_ON(!inc && cpu_buffer->mapped == 0))
> + return -EINVAL;
> +
> + mutex_lock(&buffer->mutex);
> + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
> +
> + if (inc)
> + cpu_buffer->mapped++;
> + else
> + cpu_buffer->mapped--;
> +
> + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
> + mutex_unlock(&buffer->mutex);
> +
> + return 0;
> +}



Re: [PATCH v10 2/4] dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings

2024-02-13 Thread Tanmay Shah
Hello,

Thanks for reviews please find my comments below.

On 2/13/24 1:20 PM, Rob Herring wrote:
> On Tue, 13 Feb 2024 09:54:48 -0800, Tanmay Shah wrote:
> > From: Radhey Shyam Pandey 
> > 
> > Introduce bindings for TCM memory address space on AMD-xilinx Zynq
> > UltraScale+ platform. It will help in defining TCM in device-tree
> > and make it's access platform agnostic and data-driven.
> > 
> > Tightly-coupled memories(TCMs) are low-latency memory that provides
> > predictable instruction execution and predictable data load/store
> > timing. Each Cortex-R5F processor contains two 64-bit wide 64 KB memory
> > banks on the ATCM and BTCM ports, for a total of 128 KB of memory.
> > 
> > The TCM resources(reg, reg-names and power-domain) are documented for
> > each TCM in the R5 node. The reg and reg-names are made as required
> > properties as we don't want to hardcode TCM addresses for future
> > platforms and for zu+ legacy implementation will ensure that the
> > old dts w/o reg/reg-names works and stable ABI is maintained.
> > 
> > It also extends the examples for TCM split and lockstep modes.
> > 
> > Signed-off-by: Radhey Shyam Pandey 
> > Signed-off-by: Tanmay Shah 
> > ---
> > 
> > Changes in v10:
> >   - modify number of "reg", "reg-names" and "power-domains" entries
> > based on cluster mode
> >   - Add extra optional atcm and btcm in "reg" property for lockstep mode
> >   - Add "reg-names" for extra optional atcm and btcm for lockstep mode
> >   - Drop previous Ack as bindings has new change
> > 
> > Changes in v9:
> >   - None
> > Changes in v8:
> >   - None
> > Changes in v7:
> >   - None
> > Changes in v6:
> >   - None
> > Changes in v5:
> >   - None
> > 
> > Changes in v4:
> >   - Use address-cells and size-cells value 2
> >   - Modify ranges property as per new value of address-cells
> > and size-cells
> >   - Modify child node "reg" property accordingly
> >   - Remove previous ack for further review
> > 
> > v4 link: 
> > https://lore.kernel.org/all/20230829181900.2561194-2-tanmay.s...@amd.com/
> > 
> >  .../remoteproc/xlnx,zynqmp-r5fss.yaml | 192 --
> >  1 file changed, 170 insertions(+), 22 deletions(-)
> > 
>
> My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
> on your patch (DT_CHECKER_FLAGS is new in v5.13):
>
> yamllint warnings/errors:
> ./Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml:118:13: 
> [warning] wrong indentation: expected 10 but found 12 (indentation)
Ack. I will fix this.

However, can I still get reviews on patch itself so if something else needs to 
be fixed I can fix in next revision as well.

Also, I tried to run yamllint with following command:

make DT_CHECKER_FLAGS=-m dt_binding_check 
DT_SCHEMA_FILES=Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
 O=../build/zynqmp/linux-next/


However, I see following logs without any error on bindings:

  LINT    Documentation/devicetree/bindings
invalid config: unknown option "required" for rule "quoted-strings"
*xargs: /usr/bin/yamllint: exited with status 255; aborting*
  CHKDT   Documentation/devicetree/bindings/processed-schema.json
  SCHEMA  Documentation/devicetree/bindings/processed-schema.json
  DTEX    
Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.example.dts
  DTC_CHK 
Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.example.dtb

I am not sure if my system is missing something but, yamllint tool is failing.

I appreciate if someone can point to quick fix if this is known issue and is 
already resolved.

Thanks,

Tanmay

> dtschema/dtc warnings/errors:
>
> doc reference errors (make refcheckdocs):
>
> See 
> https://patchwork.ozlabs.org/project/devicetree-bindings/patch/20240213175450.3097308-3-tanmay.s...@amd.com
>
> The base for the series is generally the latest rc1. A different dependency
> should be noted in *this* patch.
>
> If you already ran 'make dt_binding_check' and didn't see the above
> error(s), then make sure 'yamllint' is installed and dt-schema is up to
> date:
>
> pip3 install dtschema --upgrade
>
> Please check and re-submit after running the above command yourself. Note
> that DT_SCHEMA_FILES can be set to your schema file to speed up checking
> your schema. However, it must be unset to test all examples with your schema.
>



Re: [PATCH v10 2/4] dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings

2024-02-13 Thread Rob Herring


On Tue, 13 Feb 2024 09:54:48 -0800, Tanmay Shah wrote:
> From: Radhey Shyam Pandey 
> 
> Introduce bindings for TCM memory address space on AMD-xilinx Zynq
> UltraScale+ platform. It will help in defining TCM in device-tree
> and make it's access platform agnostic and data-driven.
> 
> Tightly-coupled memories(TCMs) are low-latency memory that provides
> predictable instruction execution and predictable data load/store
> timing. Each Cortex-R5F processor contains two 64-bit wide 64 KB memory
> banks on the ATCM and BTCM ports, for a total of 128 KB of memory.
> 
> The TCM resources(reg, reg-names and power-domain) are documented for
> each TCM in the R5 node. The reg and reg-names are made as required
> properties as we don't want to hardcode TCM addresses for future
> platforms and for zu+ legacy implementation will ensure that the
> old dts w/o reg/reg-names works and stable ABI is maintained.
> 
> It also extends the examples for TCM split and lockstep modes.
> 
> Signed-off-by: Radhey Shyam Pandey 
> Signed-off-by: Tanmay Shah 
> ---
> 
> Changes in v10:
>   - modify number of "reg", "reg-names" and "power-domains" entries
> based on cluster mode
>   - Add extra optional atcm and btcm in "reg" property for lockstep mode
>   - Add "reg-names" for extra optional atcm and btcm for lockstep mode
>   - Drop previous Ack as bindings has new change
> 
> Changes in v9:
>   - None
> Changes in v8:
>   - None
> Changes in v7:
>   - None
> Changes in v6:
>   - None
> Changes in v5:
>   - None
> 
> Changes in v4:
>   - Use address-cells and size-cells value 2
>   - Modify ranges property as per new value of address-cells
> and size-cells
>   - Modify child node "reg" property accordingly
>   - Remove previous ack for further review
> 
> v4 link: 
> https://lore.kernel.org/all/20230829181900.2561194-2-tanmay.s...@amd.com/
> 
>  .../remoteproc/xlnx,zynqmp-r5fss.yaml | 192 --
>  1 file changed, 170 insertions(+), 22 deletions(-)
> 

My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):

yamllint warnings/errors:
./Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml:118:13: 
[warning] wrong indentation: expected 10 but found 12 (indentation)

dtschema/dtc warnings/errors:

doc reference errors (make refcheckdocs):

See 
https://patchwork.ozlabs.org/project/devicetree-bindings/patch/20240213175450.3097308-3-tanmay.s...@amd.com

The base for the series is generally the latest rc1. A different dependency
should be noted in *this* patch.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit after running the above command yourself. Note
that DT_SCHEMA_FILES can be set to your schema file to speed up checking
your schema. However, it must be unset to test all examples with your schema.




Re: [Linux-stm32] [PATCH v2 4/4] remoteproc: stm32: Add support of an OP-TEE TA to load the firmware

2024-02-13 Thread Mathieu Poirier
On Tue, 13 Feb 2024 at 08:48, Arnaud POULIQUEN
 wrote:
>
> Hello Mathieu,
>
> On 2/5/24 10:13, Arnaud POULIQUEN wrote:
> >
> >
> > On 2/2/24 20:53, Mathieu Poirier wrote:
> >> On Thu, Feb 01, 2024 at 07:33:35PM +0100, Arnaud POULIQUEN wrote:
> >>>
> >>>
> >>> On 2/1/24 17:02, Mathieu Poirier wrote:
>  On Thu, Feb 01, 2024 at 04:06:37PM +0100, Arnaud POULIQUEN wrote:
> > hello Mathieu,
> >
> > On 1/31/24 19:52, Mathieu Poirier wrote:
> >> On Tue, Jan 30, 2024 at 10:13:48AM +0100, Arnaud POULIQUEN wrote:
> >>>
> >>>
> >>> On 1/26/24 18:11, Mathieu Poirier wrote:
>  On Thu, Jan 18, 2024 at 11:04:33AM +0100, Arnaud Pouliquen wrote:
> > The new TEE remoteproc device is used to manage remote firmware in a
> > secure, trusted context. The 'st,stm32mp1-m4-tee' compatibility is
> > introduced to delegate the loading of the firmware to the trusted
> > execution context. In such cases, the firmware should be signed and
> > adhere to the image format defined by the TEE.
> >
> > Signed-off-by: Arnaud Pouliquen 
> > ---
> > V1 to V2 update:
> > - remove the select "TEE_REMOTEPROC" in STM32_RPROC config as 
> > detected by
> >   the kernel test robot:
> >  WARNING: unmet direct dependencies detected for TEE_REMOTEPROC
> >  Depends on [n]: REMOTEPROC [=y] && OPTEE [=n]
> >  Selected by [y]:
> >  - STM32_RPROC [=y] && (ARCH_STM32 || COMPILE_TEST [=y]) && 
> > REMOTEPROC [=y]
> > - Fix initialized trproc variable in  stm32_rproc_probe
> > ---
> >  drivers/remoteproc/stm32_rproc.c | 149 
> > +--
> >  1 file changed, 144 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/remoteproc/stm32_rproc.c 
> > b/drivers/remoteproc/stm32_rproc.c
> > index fcc0001e2657..cf6a21bac945 100644
> > --- a/drivers/remoteproc/stm32_rproc.c
> > +++ b/drivers/remoteproc/stm32_rproc.c
> > @@ -20,6 +20,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >
> >  #include "remoteproc_internal.h"
> > @@ -49,6 +50,9 @@
> >  #define M4_STATE_STANDBY 4
> >  #define M4_STATE_CRASH   5
> >
> > +/* Remote processor unique identifier aligned with the Trusted 
> > Execution Environment definitions */
> > +#define STM32_MP1_M4_PROC_ID0
> > +
> >  struct stm32_syscon {
> >   struct regmap *map;
> >   u32 reg;
> > @@ -90,6 +94,8 @@ struct stm32_rproc {
> >   struct stm32_mbox mb[MBOX_NB_MBX];
> >   struct workqueue_struct *workqueue;
> >   bool hold_boot_smc;
> > + bool fw_loaded;
> > + struct tee_rproc *trproc;
> >   void __iomem *rsc_va;
> >  };
> >
> > @@ -257,6 +263,91 @@ static int stm32_rproc_release(struct rproc 
> > *rproc)
> >   return err;
> >  }
> >
> > +static int stm32_rproc_tee_elf_sanity_check(struct rproc *rproc,
> > + const struct firmware *fw)
> > +{
> > + struct stm32_rproc *ddata = rproc->priv;
> > + unsigned int ret = 0;
> > +
> > + if (rproc->state == RPROC_DETACHED)
> > + return 0;
> > +
> > + ret = tee_rproc_load_fw(ddata->trproc, fw);
> > + if (!ret)
> > + ddata->fw_loaded = true;
> > +
> > + return ret;
> > +}
> > +
> > +static int stm32_rproc_tee_elf_load(struct rproc *rproc,
> > + const struct firmware *fw)
> > +{
> > + struct stm32_rproc *ddata = rproc->priv;
> > + unsigned int ret;
> > +
> > + /*
> > +  * This function can be called by remote proc for recovery
> > +  * without the sanity check. In this case we need to load the 
> > firmware
> > +  * else nothing done here as the firmware has been preloaded 
> > for the
> > +  * sanity check to be able to parse it for the resource table.
> > +  */
> 
>  This comment is very confusing - please consider refactoring.
> 
> > + if (ddata->fw_loaded)
> > + return 0;
> > +
> 
>  I'm not sure about keeping a flag to indicate the status of the 
>  loaded firmware.
>  It is not done for the non-secure method, I don't see why it would 
>  be needed for
>  the secure one.
> 
> >>>
> >>> The difference is on the sanity check.
> >>>

[PATCH] MAINTAINERS: add Eugenio Pérez as reviewer

2024-02-13 Thread Eugenio Pérez
Add myself as a reviewer of some VirtIO areas I'm interested.

Until this point I've been scanning manually the list looking for
series that touches this area.  Adding myself to make this task easier.

Signed-off-by: Eugenio Pérez 
---
 MAINTAINERS | 4 
 1 file changed, 4 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 61117c3afa80..d0789ecc4f70 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23219,6 +23219,7 @@ M:  "Michael S. Tsirkin" 
 M: Jason Wang 
 R: Paolo Bonzini 
 R: Stefan Hajnoczi 
+R: Eugenio Pérez 
 L: virtualizat...@lists.linux.dev
 S: Maintained
 F: drivers/block/virtio_blk.c
@@ -23237,6 +23238,7 @@ VIRTIO CORE AND NET DRIVERS
 M: "Michael S. Tsirkin" 
 M: Jason Wang 
 R: Xuan Zhuo 
+R: Eugenio Pérez 
 L: virtualizat...@lists.linux.dev
 S: Maintained
 F: Documentation/ABI/testing/sysfs-bus-vdpa
@@ -23277,6 +23279,7 @@ VIRTIO FILE SYSTEM
 M: Vivek Goyal 
 M: Stefan Hajnoczi 
 M: Miklos Szeredi 
+R: Eugenio Pérez 
 L: virtualizat...@lists.linux.dev
 L: linux-fsde...@vger.kernel.org
 S: Supported
@@ -23310,6 +23313,7 @@ F:  include/uapi/linux/virtio_gpu.h
 VIRTIO HOST (VHOST)
 M: "Michael S. Tsirkin" 
 M: Jason Wang 
+R: Eugenio Pérez 
 L: k...@vger.kernel.org
 L: virtualizat...@lists.linux.dev
 L: net...@vger.kernel.org
-- 
2.43.0




Re: [PATCH net-next v3] net: dqs: add NIC stall detector based on BQL

2024-02-13 Thread Jakub Kicinski
On Tue, 13 Feb 2024 14:57:49 +0100 Eric Dumazet wrote:
> Please note that adding other sysfs entries is expensive for workloads
> creating/deleting netdev and netns often.
> 
> I _think_ we should find a way for not creating
> /sys/class/net//queues/tx-{Q}/byte_queue_limits  directory
> and files
> for non BQL enabled devices (like loopback !)

We should try, see if anyone screams. We could use IFF_NO_QUEUE, and
NETIF_F_LLTX as a proxy for "device doesn't have a real queue so BQL 
would be pointless"? Obviously better to annotate the drivers which
do have BQL support, but there's >50 of them on a quick count..



[PATCH v10 2/4] dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings

2024-02-13 Thread Tanmay Shah
From: Radhey Shyam Pandey 

Introduce bindings for TCM memory address space on AMD-xilinx Zynq
UltraScale+ platform. It will help in defining TCM in device-tree
and make it's access platform agnostic and data-driven.

Tightly-coupled memories(TCMs) are low-latency memory that provides
predictable instruction execution and predictable data load/store
timing. Each Cortex-R5F processor contains two 64-bit wide 64 KB memory
banks on the ATCM and BTCM ports, for a total of 128 KB of memory.

The TCM resources(reg, reg-names and power-domain) are documented for
each TCM in the R5 node. The reg and reg-names are made as required
properties as we don't want to hardcode TCM addresses for future
platforms and for zu+ legacy implementation will ensure that the
old dts w/o reg/reg-names works and stable ABI is maintained.

It also extends the examples for TCM split and lockstep modes.

Signed-off-by: Radhey Shyam Pandey 
Signed-off-by: Tanmay Shah 
---

Changes in v10:
  - modify number of "reg", "reg-names" and "power-domains" entries
based on cluster mode
  - Add extra optional atcm and btcm in "reg" property for lockstep mode
  - Add "reg-names" for extra optional atcm and btcm for lockstep mode
  - Drop previous Ack as bindings has new change

Changes in v9:
  - None
Changes in v8:
  - None
Changes in v7:
  - None
Changes in v6:
  - None
Changes in v5:
  - None

Changes in v4:
  - Use address-cells and size-cells value 2
  - Modify ranges property as per new value of address-cells
and size-cells
  - Modify child node "reg" property accordingly 
  - Remove previous ack for further review

v4 link: 
https://lore.kernel.org/all/20230829181900.2561194-2-tanmay.s...@amd.com/

 .../remoteproc/xlnx,zynqmp-r5fss.yaml | 192 --
 1 file changed, 170 insertions(+), 22 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml 
b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
index 78aac69f1060..2de74307821e 100644
--- a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
+++ b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
@@ -20,9 +20,21 @@ properties:
   compatible:
 const: xlnx,zynqmp-r5fss
 
+  "#address-cells":
+const: 2
+
+  "#size-cells":
+const: 2
+
+  ranges:
+description: |
+  Standard ranges definition providing address translations for
+  local R5F TCM address spaces to bus addresses.
+
   xlnx,cluster-mode:
 $ref: /schemas/types.yaml#/definitions/uint32
 enum: [0, 1, 2]
+default: 1
 description: |
   The RPU MPCore can operate in split mode (Dual-processor performance), 
Safety
   lock-step mode(Both RPU cores execute the same code in lock-step,
@@ -37,7 +49,7 @@ properties:
   2: single cpu mode
 
 patternProperties:
-  "^r5f-[a-f0-9]+$":
+  "^r5f@[0-9a-f]+$":
 type: object
 description: |
   The RPU is located in the Low Power Domain of the Processor Subsystem.
@@ -54,9 +66,6 @@ patternProperties:
   compatible:
 const: xlnx,zynqmp-r5f
 
-  power-domains:
-maxItems: 1
-
   mboxes:
 minItems: 1
 items:
@@ -101,35 +110,174 @@ patternProperties:
 
 required:
   - compatible
-  - power-domains
 
-unevaluatedProperties: false
+allOf:
+  - if:
+  properties:
+xlnx,cluster-mode:
+enum:
+  - 1
+then:
+  patternProperties:
+"^r5f@[0-9a-f]+$":
+  type: object
+
+  properties:
+reg:
+  minItems: 1
+  items:
+- description: ATCM internal memory
+- description: BTCM internal memory
+- description: extra ATCM memory in lockstep mode
+- description: extra BTCM memory in lockstep mode
+
+reg-names:
+  minItems: 1
+  items:
+- const: atcm0
+- const: btcm0
+- const: atcm1
+- const: btcm1
+
+power-domains:
+  minItems: 2
+  maxItems: 5
+
+  required:
+- reg
+- reg-names
+- power-domains
+
+else:
+  patternProperties:
+"^r5f@[0-9a-f]+$":
+  type: object
+
+  properties:
+reg:
+  minItems: 1
+  items:
+- description: ATCM internal memory
+- description: BTCM internal memory
+
+reg-names:
+  minItems: 1
+  items:
+- const: atcm0
+- const: btcm0
+
+power-domains:
+  minItems: 2
+  maxItems: 3
+
+  required:
+- reg
+- reg-names
+- power-domains
 
 required:
   - compatible
+  - "#address-cells"
+  - "#size-cells"
+  - ranges
 
 additionalProperties: false
 
 examples:
   - |
-remoteproc {
-co

[PATCH v10 3/4] dts: zynqmp: add properties for TCM in remoteproc

2024-02-13 Thread Tanmay Shah
Add properties as per new bindings in zynqmp remoteproc node
to represent TCM address and size.

This patch also adds alternative remoteproc node to represent
remoteproc cluster in split mode. By default lockstep mode is
enabled and users should disable it before using split mode
dts. Both device-tree nodes can't be used simultaneously one
of them must be disabled. For zcu102-1.0 and zcu102-1.1 board
remoteproc split mode dts node is enabled and lockstep mode
dts is disabled.

Signed-off-by: Tanmay Shah 
---

Changes in v10:
  - Add individual tcm regions via "reg" and "reg-names" for lockstep mode
  - Add each tcm's power-domains in lockstep mode
  - Drop previous Ack as new change in this patchset

Changes in v9:
  - fix rproc lockstep dts


 .../boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts  |  8 +++
 arch/arm64/boot/dts/xilinx/zynqmp.dtsi| 65 +--
 2 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts 
b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts
index c8f71a1aec89..495ca94b45db 100644
--- a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts
+++ b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts
@@ -14,6 +14,14 @@ / {
compatible = "xlnx,zynqmp-zcu102-rev1.0", "xlnx,zynqmp-zcu102", 
"xlnx,zynqmp";
 };
 
+&rproc_split {
+   status = "okay";
+};
+
+&rproc_lockstep {
+   status = "disabled";
+};
+
 &eeprom {
#address-cells = <1>;
#size-cells = <1>;
diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi 
b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
index eaba466804bc..c8a7fd0f3a1e 100644
--- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
+++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
@@ -248,19 +248,74 @@ fpga_full: fpga-full {
ranges;
};
 
-   remoteproc {
+   rproc_lockstep: remoteproc@ffe0 {
compatible = "xlnx,zynqmp-r5fss";
xlnx,cluster-mode = <1>;
 
-   r5f-0 {
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>,
+<0x0 0x2 0x0 0xffe2 0x0 0x1>,
+<0x0 0x1 0x0 0xffe1 0x0 0x1>,
+<0x0 0x3 0x0 0xffe3 0x0 0x1>;
+
+   r5f@0 {
+   compatible = "xlnx,zynqmp-r5f";
+   reg = <0x0 0x0 0x0 0x1>,
+ <0x0 0x2 0x0 0x1>,
+ <0x0 0x1 0x0 0x1>,
+ <0x0 0x3 0x0 0x1>;
+   reg-names = "atcm0", "btcm0", "atcm1", "btcm1";
+   power-domains = <&zynqmp_firmware PD_RPU_0>,
+   <&zynqmp_firmware PD_R5_0_ATCM>,
+   <&zynqmp_firmware PD_R5_0_BTCM>,
+   <&zynqmp_firmware PD_R5_1_ATCM>,
+   <&zynqmp_firmware PD_R5_1_BTCM>;
+   memory-region = <&rproc_0_fw_image>;
+   };
+
+   r5f@1 {
+   compatible = "xlnx,zynqmp-r5f";
+   reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>;
+   reg-names = "atcm0", "btcm0";
+   power-domains = <&zynqmp_firmware PD_RPU_1>,
+   <&zynqmp_firmware PD_R5_1_ATCM>,
+   <&zynqmp_firmware PD_R5_1_BTCM>;
+   memory-region = <&rproc_1_fw_image>;
+   };
+   };
+
+   rproc_split: remoteproc-split@ffe0 {
+   status = "disabled";
+   compatible = "xlnx,zynqmp-r5fss";
+   xlnx,cluster-mode = <0>;
+
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>,
+<0x0 0x2 0x0 0xffe2 0x0 0x1>,
+<0x1 0x0 0x0 0xffe9 0x0 0x1>,
+<0x1 0x2 0x0 0xffeb 0x0 0x1>;
+
+   r5f@0 {
compatible = "xlnx,zynqmp-r5f";
-   power-domains = <&zynqmp_firmware PD_RPU_0>;
+   reg = <0x0 0x0 0x0 0x1>, <0x0 0x2 0x0 0x1>;
+   reg-names = "atcm0", "btcm0";
+   power-domains = <&zynqmp_firmware PD_RPU_0>,
+   <&zynqmp_firmware PD_R5_0_ATCM>,
+   <&zynqmp_firmware PD_R5_0_BTCM>;
memory-region = <&rproc_0_fw_image>;
};
 
-   r5f-1 {
+   r5f@1 {
compatible = "xlnx,zynqmp-r5f";
-   power-domains = <&zynqmp_firmware PD_RPU_1>;
+   reg = <0x1 0x0 0x0 0x1>, <0x1 0x20

[PATCH v10 4/4] remoteproc: zynqmp: parse TCM from device tree

2024-02-13 Thread Tanmay Shah
ZynqMP TCM information was fixed in driver. Now ZynqMP TCM information
is available in device-tree. Parse TCM information in driver
as per new bindings.

Signed-off-by: Tanmay Shah 
---

Changes in v10:
  - Remove redundant changes to handle TCM in lockstep mode

Changes in v9:
  - Introduce new API to request and release core1 TCM power-domains in
lockstep mode. This will be used during prepare -> add_tcm_banks
callback to enable TCM in lockstep mode.
  - Parse TCM from device-tree in lockstep mode and split mode in
uniform way.
  - Fix TCM representation in device-tree in lockstep mode.

Changes in v8:
  - Remove pm_domains framework
  - Remove checking of pm_domain_id validation to power on/off tcm
  - Remove spurious change
  - parse power-domains property from device-tree and use EEMI calls
to power on/off TCM instead of using pm domains framework

Changes in v7:
  - move checking of pm_domain_id from previous patch
  - fix mem_bank_data memory allocation

 drivers/remoteproc/xlnx_r5_remoteproc.c | 112 ++--
 1 file changed, 107 insertions(+), 5 deletions(-)

diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
b/drivers/remoteproc/xlnx_r5_remoteproc.c
index 42b0384d34f2..49e8eaf83fce 100644
--- a/drivers/remoteproc/xlnx_r5_remoteproc.c
+++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
@@ -74,8 +74,8 @@ struct mbox_info {
 };
 
 /*
- * Hardcoded TCM bank values. This will be removed once TCM bindings are
- * accepted for system-dt specifications and upstreamed in linux kernel
+ * Hardcoded TCM bank values. This will stay in driver to maintain backward
+ * compatibility with device-tree that does not have TCM information.
  */
 static const struct mem_bank_data zynqmp_tcm_banks_split[] = {
{0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each 
*/
@@ -757,6 +757,103 @@ static struct zynqmp_r5_core 
*zynqmp_r5_add_rproc_core(struct device *cdev)
return ERR_PTR(ret);
 }
 
+static int zynqmp_r5_get_tcm_node_from_dt(struct zynqmp_r5_cluster *cluster)
+{
+   int i, j, tcm_bank_count, ret, tcm_pd_idx, pd_count;
+   struct of_phandle_args out_args = {0};
+   struct zynqmp_r5_core *r5_core;
+   struct platform_device *cpdev;
+   struct mem_bank_data *tcm;
+   struct device_node *np;
+   struct resource *res;
+   u64 abs_addr, size;
+   struct device *dev;
+
+   for (i = 0; i < cluster->core_count; i++) {
+   r5_core = cluster->r5_cores[i];
+   dev = r5_core->dev;
+   np = r5_core->np;
+
+   pd_count = of_count_phandle_with_args(np, "power-domains",
+ "#power-domain-cells");
+
+   if (pd_count <= 0) {
+   dev_err(dev, "invalid power-domains property, %d\n", 
pd_count);
+   return -EINVAL;
+   }
+
+   /* First entry in power-domains list is for r5 core, rest for 
TCM. */
+   tcm_bank_count = pd_count - 1;
+
+   if (tcm_bank_count <= 0) {
+   dev_err(dev, "invalid TCM count %d\n", tcm_bank_count);
+   return -EINVAL;
+   }
+
+   r5_core->tcm_banks = devm_kcalloc(dev, tcm_bank_count,
+ sizeof(struct mem_bank_data 
*),
+ GFP_KERNEL);
+   if (!r5_core->tcm_banks)
+   ret = -ENOMEM;
+
+   r5_core->tcm_bank_count = tcm_bank_count;
+   for (j = 0, tcm_pd_idx = 1; j < tcm_bank_count; j++, 
tcm_pd_idx++) {
+   tcm = devm_kzalloc(dev, sizeof(struct mem_bank_data),
+  GFP_KERNEL);
+   if (!tcm)
+   return -ENOMEM;
+
+   r5_core->tcm_banks[j] = tcm;
+
+   /* Get power-domains id of TCM. */
+   ret = of_parse_phandle_with_args(np, "power-domains",
+"#power-domain-cells",
+tcm_pd_idx, &out_args);
+   if (ret) {
+   dev_err(r5_core->dev,
+   "failed to get tcm %d pm domain, ret 
%d\n",
+   tcm_pd_idx, ret);
+   return ret;
+   }
+   tcm->pm_domain_id = out_args.args[0];
+   of_node_put(out_args.np);
+
+   /* Get TCM address without translation. */
+   ret = of_property_read_reg(np, j, &abs_addr, &size);
+   if (ret) {
+   dev_err(dev, "failed to get reg property\n");
+   return ret;
+   }
+
+

[PATCH v10 0/4] add zynqmp TCM bindings

2024-02-13 Thread Tanmay Shah
Tightly-Coupled Memories(TCMs) are low-latency memory that provides
predictable instruction execution and predictable data load/store
timing. Each Cortex-R5F processor contains exclusive two 64 KB memory
banks on the ATCM and BTCM ports, for a total of 128 KB of memory.
In lockstep mode, both 128KB memory is accessible to the cluster.

As per ZynqMP Ultrascale+ Technical Reference Manual UG1085, following
is address space of TCM memory. The bindings in this patch series
introduces properties to accommodate following address space with
address translation between Linux and Cortex-R5 views.

| | | |
| --- | --- | --- |
|  *Mode*|   *R5 View* | *Linux view* |  Notes   |
| *Split Mode*   | *start addr*| *start addr* |  |
| R5_0 ATCM (64 KB)  | 0x_ | 0xFFE0_  |  |
| R5_0 BTCM (64 KB)  | 0x0002_ | 0xFFE2_  |  |
| R5_1 ATCM (64 KB)  | 0x_ | 0xFFE9_  | alias of 0xFFE1_ |
| R5_1 BTCM (64 KB)  | 0x0002_ | 0xFFEB_  | alias of 0xFFE3_ |
|  ___   | ___ |___   |  |
| *Lockstep Mode*| |  |  |
| R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_  |  |
| R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_  |  |

References:
UG1085 TCM address space:
https://docs.xilinx.com/r/en-US/ug1085-zynq-ultrascale-trm/Tightly-Coupled-Memory-Address-Map

Changs in v10:
  - Add new patch (1/4) to series that changes hardcode TCM addresses in
lockstep mode and removes separate handling of TCM in lockstep and
split mode
  - modify number of "reg", "reg-names" and "power-domains" entries
based on cluster mode
  - Add extra optional atcm and btcm in "reg" property for lockstep mode
  - Add "reg-names" for extra optional atcm and btcm for lockstep mode
  - Drop previous Ack as bindings has new change
  - Add individual tcm regions via "reg" and "reg-names" for lockstep mode
  - Add each tcm's power-domains in lockstep mode
  - Drop previous Ack as new change in dts patchset
  - Remove redundant changes in driver to handle TCM in lockstep mode

Changes in v9:
  - Fix rproc lockstep dts
  - Introduce new API to request and release core1 TCM power-domains in
lockstep mode. This will be used during prepare -> add_tcm_banks
callback to enable TCM in lockstep mode.
  - Parse TCM from device-tree in lockstep mode and split mode in
uniform way.
  - Fix TCM representation in device-tree in lockstep mode.
  - Fix comments as suggested

Changes in v8:
  - Remove use of pm_domains framework
  - Remove checking of pm_domain_id validation to power on/off tcm
  - Remove spurious change
  - parse power-domains property from device-tree and use EEMI calls
to power on/off TCM instead of using pm domains framework

Changes in v7:
  - %s/pm_dev1/pm_dev_core0/r
  - %s/pm_dev_link1/pm_dev_core0_link/r
  - %s/pm_dev2/pm_dev_core1/r
  - %s/pm_dev_link2/pm_dev_core1_link/r
  - remove pm_domain_id check to move next patch
  - add comment about how 1st entry in pm domain list is used
  - fix loop when jump to fail_add_pm_domains loop
  - move checking of pm_domain_id from previous patch
  - fix mem_bank_data memory allocation

Changes in v6:
  - Introduce new node entry for r5f cluster split mode dts and
keep it disabled by default.
  - Keep remoteproc lockstep mode enabled by default to maintian
back compatibility.
  - Enable split mode only for zcu102 board to demo split mode use
  - Remove spurious change
  - Handle errors in add_pm_domains function
  - Remove redundant code to handle errors from remove_pm_domains
  - Missing . at the end of the commit message
  - remove redundant initialization of variables
  - remove fail_tcm label and relevant code to free memory
acquired using devm_* API. As this will be freed when device free it
  - add extra check to see if "reg" property is supported or not

Changes in v5:
  - maintain Rob's Ack on bindings patch as no changes in bindings
  - split previous patch into multiple patches
  - Use pm domain framework to turn on/off TCM
  - Add support of parsing TCM information from device-tree
  - maintain backward compatibility with previous bindings without
TCM information available in device-tree

This patch series continues previous effort to upstream ZynqMP
TCM bindings:
Previous v4 version link:
https://lore.kernel.org/all/20230829181900.2561194-1-tanmay.s...@amd.com/

Previous v3 version link:
https://lore.kernel.org/all/1689964908-22371-1-git-send-email-radhey.shyam.pan...@amd.com/
Radhey Shyam Pandey (1):
  dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings


Radhey Shyam Pandey (1):
  dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings

Tanmay Shah (3):
  remoteproc: zynqmp: fix lockstep mode memory region
  dts: zynqmp: add properties for TCM in remoteproc
  remoteproc: zyn

[PATCH v10 1/4] remoteproc: zynqmp: fix lockstep mode memory region

2024-02-13 Thread Tanmay Shah
In lockstep mode, r5 core0 uses TCM of R5 core1. Following is lockstep
mode memory region as per hardware reference manual.

|  *TCM* |   *R5 View* | *Linux view* |
| R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_  |
| R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_  |

However, driver shouldn't model it as above because R5 core0 TCM and core1
TCM has different power-domains mapped to it.
Hence, TCM address space in lockstep mode should be modeled as 64KB
regions only where each region has its own power-domain as following:

|  *TCM* |   *R5 View* | *Linux view* |
| R5_0 ATCM0 (64 KB) | 0x_ | 0xFFE0_  |
| R5_0 BTCM0 (64 KB) | 0x0002_ | 0xFFE2_  |
| R5_0 ATCM1 (64 KB) | 0x0001_ | 0xFFE1_  |
| R5_0 BTCM1 (64 KB) | 0x0003_ | 0xFFE3_  |

This makes driver maintanance easy and makes design robust for future
platorms as well.

Signed-off-by: Tanmay Shah 
---
 drivers/remoteproc/xlnx_r5_remoteproc.c | 145 ++--
 1 file changed, 12 insertions(+), 133 deletions(-)

diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
b/drivers/remoteproc/xlnx_r5_remoteproc.c
index 4395edea9a64..42b0384d34f2 100644
--- a/drivers/remoteproc/xlnx_r5_remoteproc.c
+++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
@@ -84,12 +84,12 @@ static const struct mem_bank_data zynqmp_tcm_banks_split[] 
= {
{0xffebUL, 0x2, 0x1UL, PD_R5_1_BTCM, "btcm1"},
 };
 
-/* In lockstep mode cluster combines each 64KB TCM and makes 128KB TCM */
+/* In lockstep mode cluster uses each 64KB TCM from second core as well */
 static const struct mem_bank_data zynqmp_tcm_banks_lockstep[] = {
-   {0xffe0UL, 0x0, 0x2UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 128KB 
each */
-   {0xffe2UL, 0x2, 0x2UL, PD_R5_0_BTCM, "btcm0"},
-   {0, 0, 0, PD_R5_1_ATCM, ""},
-   {0, 0, 0, PD_R5_1_BTCM, ""},
+   {0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each 
*/
+   {0xffe2UL, 0x2, 0x1UL, PD_R5_0_BTCM, "btcm0"},
+   {0xffe1UL, 0x1, 0x1UL, PD_R5_1_ATCM, "atcm1"},
+   {0xffe3UL, 0x3, 0x1UL, PD_R5_1_BTCM, "btcm1"},
 };
 
 /**
@@ -540,14 +540,14 @@ static int tcm_mem_map(struct rproc *rproc,
 }
 
 /*
- * add_tcm_carveout_split_mode()
+ * add_tcm_banks()
  * @rproc: single R5 core's corresponding rproc instance
  *
- * allocate and add remoteproc carveout for TCM memory in split mode
+ * allocate and add remoteproc carveout for TCM memory
  *
  * return 0 on success, otherwise non-zero value on failure
  */
-static int add_tcm_carveout_split_mode(struct rproc *rproc)
+static int add_tcm_banks(struct rproc *rproc)
 {
struct rproc_mem_entry *rproc_mem;
struct zynqmp_r5_core *r5_core;
@@ -580,10 +580,10 @@ static int add_tcm_carveout_split_mode(struct rproc 
*rproc)
 ZYNQMP_PM_REQUEST_ACK_BLOCKING);
if (ret < 0) {
dev_err(dev, "failed to turn on TCM 0x%x", 
pm_domain_id);
-   goto release_tcm_split;
+   goto release_tcm;
}
 
-   dev_dbg(dev, "TCM carveout split mode %s addr=%llx, da=0x%x, 
size=0x%lx",
+   dev_dbg(dev, "TCM carveout %s addr=%llx, da=0x%x, size=0x%lx",
bank_name, bank_addr, da, bank_size);
 
rproc_mem = rproc_mem_entry_init(dev, NULL, bank_addr,
@@ -593,7 +593,7 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc)
if (!rproc_mem) {
ret = -ENOMEM;
zynqmp_pm_release_node(pm_domain_id);
-   goto release_tcm_split;
+   goto release_tcm;
}
 
rproc_add_carveout(rproc, rproc_mem);
@@ -601,7 +601,7 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc)
 
return 0;
 
-release_tcm_split:
+release_tcm:
/* If failed, Turn off all TCM banks turned on before */
for (i--; i >= 0; i--) {
pm_domain_id = r5_core->tcm_banks[i]->pm_domain_id;
@@ -610,127 +610,6 @@ static int add_tcm_carveout_split_mode(struct rproc 
*rproc)
return ret;
 }
 
-/*
- * add_tcm_carveout_lockstep_mode()
- * @rproc: single R5 core's corresponding rproc instance
- *
- * allocate and add remoteproc carveout for TCM memory in lockstep mode
- *
- * return 0 on success, otherwise non-zero value on failure
- */
-static int add_tcm_carveout_lockstep_mode(struct rproc *rproc)
-{
-   struct rproc_mem_entry *rproc_mem;
-   struct zynqmp_r5_core *r5_core;
-   int i, num_banks, ret;
-   phys_addr_t bank_addr;
-   size_t bank_size = 0;
-   struct device *dev;
-   u32 pm_domain_id;
-   char *bank_name;
-   u32 da;
-
-   r5_core = rproc->priv;
-   dev = r5_core->dev;
-
-   /* Go through zynqmp banks for r5 node */
-   num_banks = r5_core->tcm_bank

[PATCH v2] tracing: Have saved_cmdlines arrays all in one allocation

2024-02-13 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

The saved_cmdlines have three arrays for mapping PIDs to COMMs:

 - map_pid_to_cmdline[]
 - map_cmdline_to_pid[]
 - saved_cmdlines

The map_pid_to_cmdline[] is PID_MAX_DEFAULT in size and holds the index
into the other arrays. The map_cmdline_to_pid[] is a mapping back to the
full pid as it can be larger than PID_MAX_DEFAULT. And the
saved_cmdlines[] just holds the COMMs associated to the pids.

Currently the map_pid_to_cmdline[] and saved_cmdlines[] are allocated
together (in reality the saved_cmdlines is just in the memory of the
rounding of the allocation of the structure as it is always allocated in
powers of two). The map_cmdline_to_pid[] array is allocated separately.

Since the rounding to a power of two is rather large (it allows for 8000
elements in saved_cmdlines), also include the map_cmdline_to_pid[] array.
(This drops it to 6000 by default, which is still plenty for most use
cases). This saves even more memory as the map_cmdline_to_pid[] array
doesn't need to be allocated.

Link: 
https://lore.kernel.org/linux-trace-kernel/20240212174011.06821...@gandalf.local.home/

Signed-off-by: Steven Rostedt (Google) 
---
Changes since v1: 
https://lore.kernel.org/linux-trace-kernel/20240212180941.379c4...@gandalf.local.home/

-- Added SAVED_CMDLINE_MAP_ELEMENT_SIZE helper macro.

 kernel/trace/trace_sched_switch.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/trace_sched_switch.c 
b/kernel/trace/trace_sched_switch.c
index e4fbcc3bede5..5f3e9bc87079 100644
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -175,6 +175,10 @@ struct saved_cmdlines_buffer {
 };
 static struct saved_cmdlines_buffer *savedcmd;
 
+/* Holds the size of a cmdline and pid element */
+#define SAVED_CMDLINE_MAP_ELEMENT_SIZE(s)  \
+   (TASK_COMM_LEN + sizeof((s)->map_cmdline_to_pid[0]))
+
 static inline char *get_saved_cmdlines(int idx)
 {
return &savedcmd->saved_cmdlines[idx * TASK_COMM_LEN];
@@ -201,7 +205,7 @@ static struct saved_cmdlines_buffer 
*allocate_cmdlines_buffer(unsigned int val)
int order;
 
/* Figure out how much is needed to hold the given number of cmdlines */
-   orig_size = sizeof(*s) + val * TASK_COMM_LEN;
+   orig_size = sizeof(*s) + val * SAVED_CMDLINE_MAP_ELEMENT_SIZE(s);
order = get_order(orig_size);
size = 1 << (order + PAGE_SHIFT);
page = alloc_pages(GFP_KERNEL, order);
@@ -212,16 +216,11 @@ static struct saved_cmdlines_buffer 
*allocate_cmdlines_buffer(unsigned int val)
memset(s, 0, sizeof(*s));
 
/* Round up to actual allocation */
-   val = (size - sizeof(*s)) / TASK_COMM_LEN;
+   val = (size - sizeof(*s)) / SAVED_CMDLINE_MAP_ELEMENT_SIZE(s);
s->cmdline_num = val;
 
-   s->map_cmdline_to_pid = kmalloc_array(val,
- sizeof(*s->map_cmdline_to_pid),
- GFP_KERNEL);
-   if (!s->map_cmdline_to_pid) {
-   free_saved_cmdlines_buffer(s);
-   return NULL;
-   }
+   /* Place map_cmdline_to_pid array right after saved_cmdlines */
+   s->map_cmdline_to_pid = (unsigned *)&s->saved_cmdlines[val * 
TASK_COMM_LEN];
 
s->cmdline_idx = 0;
memset(&s->map_pid_to_cmdline, NO_CMDLINE_MAP,
-- 
2.43.0




Re: [PATCH] tracing: Have saved_cmdlines arrays all in one allocation

2024-02-13 Thread Tim Chen
On Mon, 2024-02-12 at 19:13 -0500, Steven Rostedt wrote:
> On Mon, 12 Feb 2024 15:39:03 -0800
> Tim Chen  wrote:
> 
> > > diff --git a/kernel/trace/trace_sched_switch.c 
> > > b/kernel/trace/trace_sched_switch.c
> > > index e4fbcc3bede5..210c74dcd016 100644
> > > --- a/kernel/trace/trace_sched_switch.c
> > > +++ b/kernel/trace/trace_sched_switch.c
> > > @@ -201,7 +201,7 @@ static struct saved_cmdlines_buffer 
> > > *allocate_cmdlines_buffer(unsigned int val)
> > >   int order;
> > >  
> > >   /* Figure out how much is needed to hold the given number of cmdlines */
> > > - orig_size = sizeof(*s) + val * TASK_COMM_LEN;
> > > + orig_size = sizeof(*s) + val * (TASK_COMM_LEN + sizeof(int));  
> > 
> > Strictly speaking, *map_cmdline_to_pid is unsigned int so it is more 
> > consistent
> > to use sizeof(unsigned) in line above.  But I'm nitpicking and I'm fine to
> > leave it as is.
> 
> I was thinking about making that into a macro as it is used in two places.
> 
> /* Holds the size of a cmdline and pid element */
> #define SAVED_CMDLINE_MAP_ELEMENT_SIZE(s) \
>   (TASK_COMM_LEN + sizeof((s)->map_cmdline_to_pid[0]))
> 
>   orig_size = sizeof(*s) + val * SAVED_CMDLINE_MAP_ELEMENT_SIZE(s);
> 
> 

Looks good. This makes the code more readable.

Tim



Re: [PATCH V2 3/3] vdpa_sim: flush workers on suspend

2024-02-13 Thread Eugenio Perez Martin
On Mon, Feb 12, 2024 at 6:16 PM Steve Sistare  wrote:
>
> Flush to guarantee no workers are running when suspend returns.
>
> Signed-off-by: Steve Sistare 
> ---
>  drivers/vdpa/vdpa_sim/vdpa_sim.c | 13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c 
> b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index be2925d0d283..a662b90357c3 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -74,6 +74,17 @@ static void vdpasim_worker_change_mm_sync(struct vdpasim 
> *vdpasim,
> kthread_flush_work(work);
>  }
>
> +static void flush_work_fn(struct kthread_work *work) {}
> +
> +static void vdpasim_flush_work(struct vdpasim *vdpasim)
> +{
> +   struct kthread_work work;
> +
> +   kthread_init_work(&work, flush_work_fn);

If the work is already queued, doesn't it break the linked list
because of the memset in kthread_init_work?

> +   kthread_queue_work(vdpasim->worker, &work);
> +   kthread_flush_work(&work);
> +}
> +
>  static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
>  {
> return container_of(vdpa, struct vdpasim, vdpa);
> @@ -511,6 +522,8 @@ static int vdpasim_suspend(struct vdpa_device *vdpa)
> vdpasim->running = false;
> mutex_unlock(&vdpasim->mutex);
>
> +   vdpasim_flush_work(vdpasim);

Do we need to protect the case where vdpasim_kick_vq and
vdpasim_suspend are called "at the same time"? Correct userland should
not be doing it but buggy or mailious could be. Just calling
vdpasim_flush_work with the mutex acquired would solve the issue,
doesn't it?

Thanks!

> +
> return 0;
>  }
>
> --
> 2.39.3
>




Re: [PATCH V2 1/3] vhost-vdpa: flush workers on suspend

2024-02-13 Thread Eugenio Perez Martin
On Mon, Feb 12, 2024 at 6:16 PM Steve Sistare  wrote:
>
> Flush to guarantee no workers are running when suspend returns.
>
> Signed-off-by: Steve Sistare 

Acked-by: Eugenio Pérez 

Should this have a Fixes tag?

> ---
>  drivers/vhost/vdpa.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index bc4a51e4638b..a3b986c24805 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -594,10 +594,13 @@ static long vhost_vdpa_suspend(struct vhost_vdpa *v)
> struct vdpa_device *vdpa = v->vdpa;
> const struct vdpa_config_ops *ops = vdpa->config;
> int ret;
> +   struct vhost_dev *vdev = &v->vdev;
>
> if (!ops->suspend)
> return -EOPNOTSUPP;
>
> +   vhost_dev_flush(vdev);
> +
> ret = ops->suspend(vdpa);
> if (!ret)
> v->suspended = true;
> --
> 2.39.3
>




Re: [Linux-stm32] [PATCH v2 4/4] remoteproc: stm32: Add support of an OP-TEE TA to load the firmware

2024-02-13 Thread Arnaud POULIQUEN
Hello Mathieu,

On 2/5/24 10:13, Arnaud POULIQUEN wrote:
> 
> 
> On 2/2/24 20:53, Mathieu Poirier wrote:
>> On Thu, Feb 01, 2024 at 07:33:35PM +0100, Arnaud POULIQUEN wrote:
>>>
>>>
>>> On 2/1/24 17:02, Mathieu Poirier wrote:
 On Thu, Feb 01, 2024 at 04:06:37PM +0100, Arnaud POULIQUEN wrote:
> hello Mathieu,
>
> On 1/31/24 19:52, Mathieu Poirier wrote:
>> On Tue, Jan 30, 2024 at 10:13:48AM +0100, Arnaud POULIQUEN wrote:
>>>
>>>
>>> On 1/26/24 18:11, Mathieu Poirier wrote:
 On Thu, Jan 18, 2024 at 11:04:33AM +0100, Arnaud Pouliquen wrote:
> The new TEE remoteproc device is used to manage remote firmware in a
> secure, trusted context. The 'st,stm32mp1-m4-tee' compatibility is
> introduced to delegate the loading of the firmware to the trusted
> execution context. In such cases, the firmware should be signed and
> adhere to the image format defined by the TEE.
>
> Signed-off-by: Arnaud Pouliquen 
> ---
> V1 to V2 update:
> - remove the select "TEE_REMOTEPROC" in STM32_RPROC config as 
> detected by
>   the kernel test robot:
>  WARNING: unmet direct dependencies detected for TEE_REMOTEPROC
>  Depends on [n]: REMOTEPROC [=y] && OPTEE [=n]
>  Selected by [y]:
>  - STM32_RPROC [=y] && (ARCH_STM32 || COMPILE_TEST [=y]) && 
> REMOTEPROC [=y]
> - Fix initialized trproc variable in  stm32_rproc_probe
> ---
>  drivers/remoteproc/stm32_rproc.c | 149 
> +--
>  1 file changed, 144 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/remoteproc/stm32_rproc.c 
> b/drivers/remoteproc/stm32_rproc.c
> index fcc0001e2657..cf6a21bac945 100644
> --- a/drivers/remoteproc/stm32_rproc.c
> +++ b/drivers/remoteproc/stm32_rproc.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include "remoteproc_internal.h"
> @@ -49,6 +50,9 @@
>  #define M4_STATE_STANDBY 4
>  #define M4_STATE_CRASH   5
>  
> +/* Remote processor unique identifier aligned with the Trusted 
> Execution Environment definitions */
> +#define STM32_MP1_M4_PROC_ID0
> +
>  struct stm32_syscon {
>   struct regmap *map;
>   u32 reg;
> @@ -90,6 +94,8 @@ struct stm32_rproc {
>   struct stm32_mbox mb[MBOX_NB_MBX];
>   struct workqueue_struct *workqueue;
>   bool hold_boot_smc;
> + bool fw_loaded;
> + struct tee_rproc *trproc;
>   void __iomem *rsc_va;
>  };
>  
> @@ -257,6 +263,91 @@ static int stm32_rproc_release(struct rproc 
> *rproc)
>   return err;
>  }
>  
> +static int stm32_rproc_tee_elf_sanity_check(struct rproc *rproc,
> + const struct firmware *fw)
> +{
> + struct stm32_rproc *ddata = rproc->priv;
> + unsigned int ret = 0;
> +
> + if (rproc->state == RPROC_DETACHED)
> + return 0;
> +
> + ret = tee_rproc_load_fw(ddata->trproc, fw);
> + if (!ret)
> + ddata->fw_loaded = true;
> +
> + return ret;
> +}
> +
> +static int stm32_rproc_tee_elf_load(struct rproc *rproc,
> + const struct firmware *fw)
> +{
> + struct stm32_rproc *ddata = rproc->priv;
> + unsigned int ret;
> +
> + /*
> +  * This function can be called by remote proc for recovery
> +  * without the sanity check. In this case we need to load the 
> firmware
> +  * else nothing done here as the firmware has been preloaded 
> for the
> +  * sanity check to be able to parse it for the resource table.
> +  */

 This comment is very confusing - please consider refactoring.  

> + if (ddata->fw_loaded)
> + return 0;
> +

 I'm not sure about keeping a flag to indicate the status of the loaded 
 firmware.
 It is not done for the non-secure method, I don't see why it would be 
 needed for
 the secure one.

>>>
>>> The difference is on the sanity check.
>>> - in rproc_elf_sanity_check we  parse the elf file to verify that it is
>>> valid.
>>> - in stm32_rproc_tee_elf_sanity_check we have to do the same, that 
>>> means to
>>> authenticate it. the authentication is done during the load.
>>>
>>> So this flag is used to avoid 

Re: [PATCH v2 2/4] dt-bindings: remoteproc: Add compatibility for TEE support

2024-02-13 Thread Arnaud POULIQUEN
Hello Rob,

On 1/30/24 18:51, Rob Herring wrote:
> On Thu, Jan 18, 2024 at 11:04:31AM +0100, Arnaud Pouliquen wrote:
>> The "st,stm32mp1-m4-tee" compatible is utilized in a system configuration
>> where the Cortex-M4 firmware is loaded by the Trusted execution Environment
>> (TEE).
>> For instance, this compatible is used in both the Linux and OP-TEE
>> device-tree:
>> - In OP-TEE, a node is defined in the device tree with the
>>   st,stm32mp1-m4-tee to support signed remoteproc firmware.
>>   Based on DT properties, OP-TEE authenticates, loads, starts, and stops
>>   the firmware.
>> - On Linux, when the compatibility is set, the Cortex-M resets should not
>>   be declared in the device tree.
>>
>> Signed-off-by: Arnaud Pouliquen 
>> ---
>> V1 to V2 updates
>> - update "st,stm32mp1-m4" compatible description to generalize
>> - remove the 'reset-names' requirement in one conditional branch, as the
>>   property is already part of the condition test.
>> ---
>>  .../bindings/remoteproc/st,stm32-rproc.yaml   | 52 +++
>>  1 file changed, 43 insertions(+), 9 deletions(-)
>>
>> diff --git 
>> a/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml 
>> b/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml
>> index 370af61d8f28..6af821b15736 100644
>> --- a/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml
>> +++ b/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml
>> @@ -16,7 +16,12 @@ maintainers:
>>  
>>  properties:
>>compatible:
>> -const: st,stm32mp1-m4
>> +enum:
>> +  - st,stm32mp1-m4
>> +  - st,stm32mp1-m4-tee
>> +description:
>> +  Use "st,stm32mp1-m4" for the Cortex-M4 coprocessor management by 
>> non-secure context
>> +  Use "st,stm32mp1-m4-tee" for the Cortex-M4 coprocessor management by 
>> secure context
>>  
>>reg:
>>  description:
>> @@ -142,21 +147,40 @@ properties:
>>  required:
>>- compatible
>>- reg
>> -  - resets
>>  
>>  allOf:
>>- if:
>>properties:
>> -reset-names:
>> -  not:
>> -contains:
>> -  const: hold_boot
>> +compatible:
>> +  contains:
>> +const: st,stm32mp1-m4
>> +then:
>> +  if:
>> +properties:
>> +  reset-names:
>> +not:
>> +  contains:
>> +const: hold_boot
> 
> Note that this is true when 'reset-names' is not present. If that is not 
> desired, then you need 'required: [reset-names]'. Not really a new issue 
> though.
> 

Yes that corresponds to my expectation, for compatibility with legacy DT.
If the hold_boot reset was not used, reset-names was not mandatory
I will add the 'required: [reset-names]' in the else

Thanks,
Arnaud

>> +  then:
>> +required:
>> +  - st,syscfg-holdboot
>> +  - resets
>> +  else:
>> +properties:
>> +  st,syscfg-holdboot: false
>> +required:
>> +  - resets
> 
> 'resets' is always required within the outer 'then' schema, so you can 
> move this up a level.
> 
>> +
>> +  - if:
>> +  properties:
>> +compatible:
>> +  contains:
>> +const: st,stm32mp1-m4-tee
>>  then:
>> -  required:
>> -- st,syscfg-holdboot
>> -else:
>>properties:
>>  st,syscfg-holdboot: false
>> +reset-names: false
>> +resets: false
>>  
>>  additionalProperties: false
>>  
>> @@ -188,5 +212,15 @@ examples:
>>st,syscfg-rsc-tbl = <&tamp 0x144 0x>;
>>st,syscfg-m4-state = <&tamp 0x148 0x>;
>>  };
>> +  - |
>> +#include 
>> +m4@1000 {
>> +  compatible = "st,stm32mp1-m4-tee";
>> +  reg = <0x1000 0x4>,
>> +<0x3000 0x4>,
>> +<0x3800 0x1>;
>> +  st,syscfg-rsc-tbl = <&tamp 0x144 0x>;
>> +  st,syscfg-m4-state = <&tamp 0x148 0x>;
>> +};
>>  
>>  ...
>> -- 
>> 2.25.1
>>



Re: [PATCH V2] vdpa_sim: reset must not run

2024-02-13 Thread Steven Sistare
Changes in V2:
  - test DRIVER_OK instead of FEATURES_OK
  - post singly, and add RB and acks

- Steve

On 2/9/2024 5:30 PM, Steve Sistare wrote:
> vdpasim_do_reset sets running to true, which is wrong, as it allows
> vdpasim_kick_vq to post work requests before the device has been
> configured.  To fix, do not set running until VIRTIO_CONFIG_S_DRIVER_OK
> is set.
> 
> Fixes: 0c89e2a3a9d0 ("vdpa_sim: Implement suspend vdpa op")
> Signed-off-by: Steve Sistare 
> Reviewed-by: Eugenio Pérez 
> Acked-by: Jason Wang 
> ---
>  drivers/vdpa/vdpa_sim/vdpa_sim.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c 
> b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index be2925d0d283..18584ce70bf0 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -160,7 +160,7 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim, u32 
> flags)
>   }
>   }
>  
> - vdpasim->running = true;
> + vdpasim->running = false;
>   spin_unlock(&vdpasim->iommu_lock);
>  
>   vdpasim->features = 0;
> @@ -483,6 +483,7 @@ static void vdpasim_set_status(struct vdpa_device *vdpa, 
> u8 status)
>  
>   mutex_lock(&vdpasim->mutex);
>   vdpasim->status = status;
> + vdpasim->running = (status & VIRTIO_CONFIG_S_DRIVER_OK) != 0;
>   mutex_unlock(&vdpasim->mutex);
>  }
>  



Re: [PATCH V2 0/3] flush workers on suspend

2024-02-13 Thread Steven Sistare
On 2/12/2024 12:16 PM, Steve Sistare wrote:
> Flush to guarantee no workers are running when suspend returns,
> for vdpa, vpa_sim, and vduse.  (mlx5 already does so, via the path
> mlx5_vdpa_suspend -> unregister_link_notifier -> flush_workqueue.)

Changes in V2:
  - renamed "vduse: suspend" (was vduse: flush workers on suspend)
  - call vhost_dev_flush unconditionally in "vhost-vdpa: flush workers on 
suspend"

- Steve

> Steve Sistare (3):
>   vhost-vdpa: flush workers on suspend
>   vduse: suspend
>   vdpa_sim: flush workers on suspend
> 
>  drivers/vdpa/vdpa_sim/vdpa_sim.c   | 13 +
>  drivers/vdpa/vdpa_user/vduse_dev.c | 24 
>  drivers/vhost/vdpa.c   |  3 +++
>  3 files changed, 40 insertions(+)
> 



Re: [PATCH V2] vdpa: skip suspend/resume ops if not DRIVER_OK

2024-02-13 Thread Steven Sistare
On 2/13/2024 9:58 AM, Eugenio Perez Martin wrote:
> On Tue, Feb 13, 2024 at 3:26 PM Steve Sistare  
> wrote:
>>
>> If a vdpa device is not in state DRIVER_OK, then there is no driver state
>> to preserve, so no need to call the suspend and resume driver ops.
>>
>> Suggested-by: Eugenio Perez Martin "
>> Signed-off-by: Steve Sistare 
> 
> Reviewed-by: Eugenio Pérez 
> 
> Please include a changelog from previous versions for the next series.
> 
> Thanks!

Will do, thanks - steve

>> ---
>>  drivers/vhost/vdpa.c | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>> index bc4a51e4638b..aef92a7c57f3 100644
>> --- a/drivers/vhost/vdpa.c
>> +++ b/drivers/vhost/vdpa.c
>> @@ -595,6 +595,9 @@ static long vhost_vdpa_suspend(struct vhost_vdpa *v)
>> const struct vdpa_config_ops *ops = vdpa->config;
>> int ret;
>>
>> +   if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK))
>> +   return 0;
>> +
>> if (!ops->suspend)
>> return -EOPNOTSUPP;
>>
>> @@ -615,6 +618,9 @@ static long vhost_vdpa_resume(struct vhost_vdpa *v)
>> const struct vdpa_config_ops *ops = vdpa->config;
>> int ret;
>>
>> +   if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK))
>> +   return 0;
>> +
>> if (!ops->resume)
>> return -EOPNOTSUPP;
>>
>> --
>> 2.39.3
>>
> 



Re: [PATCH V2] vdpa: skip suspend/resume ops if not DRIVER_OK

2024-02-13 Thread Eugenio Perez Martin
On Tue, Feb 13, 2024 at 3:26 PM Steve Sistare  wrote:
>
> If a vdpa device is not in state DRIVER_OK, then there is no driver state
> to preserve, so no need to call the suspend and resume driver ops.
>
> Suggested-by: Eugenio Perez Martin "
> Signed-off-by: Steve Sistare 

Reviewed-by: Eugenio Pérez 

Please include a changelog from previous versions for the next series.

Thanks!

> ---
>  drivers/vhost/vdpa.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index bc4a51e4638b..aef92a7c57f3 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -595,6 +595,9 @@ static long vhost_vdpa_suspend(struct vhost_vdpa *v)
> const struct vdpa_config_ops *ops = vdpa->config;
> int ret;
>
> +   if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK))
> +   return 0;
> +
> if (!ops->suspend)
> return -EOPNOTSUPP;
>
> @@ -615,6 +618,9 @@ static long vhost_vdpa_resume(struct vhost_vdpa *v)
> const struct vdpa_config_ops *ops = vdpa->config;
> int ret;
>
> +   if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK))
> +   return 0;
> +
> if (!ops->resume)
> return -EOPNOTSUPP;
>
> --
> 2.39.3
>




Re: [PATCH v6.1.y-v4.19.y] vhost: use kzalloc() instead of kmalloc() followed by memset()

2024-02-13 Thread Greg KH
On Mon, Feb 05, 2024 at 10:49:37AM +0530, Ajay Kaher wrote:
> From: Prathu Baronia 
> 
> From: Prathu Baronia 
> 
> commit 4d8df0f5f79f747d75a7d356d9b9ea40a4e4c8a9 upstream
> 
> Use kzalloc() to allocate new zeroed out msg node instead of
> memsetting a node allocated with kmalloc().
> 
> Signed-off-by: Prathu Baronia 
> Message-Id: <20230522085019.42914-1-prathubaronia2...@gmail.com>
> Signed-off-by: Michael S. Tsirkin 
> Reviewed-by: Stefano Garzarella 
> [Ajay: This is a security fix as per CVE-2024-0340]

And this is why I am so grumpy about Red Hat and CVEs, they know how to
let us know about stuff like this, but no.  Hopefully, someday soon,
they will soon not be allowed to do this anymore.

{sigh}

Now queued up, thanks.

greg k-h



[PATCH V2] vdpa: skip suspend/resume ops if not DRIVER_OK

2024-02-13 Thread Steve Sistare
If a vdpa device is not in state DRIVER_OK, then there is no driver state
to preserve, so no need to call the suspend and resume driver ops.

Suggested-by: Eugenio Perez Martin "
Signed-off-by: Steve Sistare 
---
 drivers/vhost/vdpa.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index bc4a51e4638b..aef92a7c57f3 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -595,6 +595,9 @@ static long vhost_vdpa_suspend(struct vhost_vdpa *v)
const struct vdpa_config_ops *ops = vdpa->config;
int ret;
 
+   if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK))
+   return 0;
+
if (!ops->suspend)
return -EOPNOTSUPP;
 
@@ -615,6 +618,9 @@ static long vhost_vdpa_resume(struct vhost_vdpa *v)
const struct vdpa_config_ops *ops = vdpa->config;
int ret;
 
+   if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK))
+   return 0;
+
if (!ops->resume)
return -EOPNOTSUPP;
 
-- 
2.39.3




Re: [PATCH net-next v3] net: dqs: add NIC stall detector based on BQL

2024-02-13 Thread Eric Dumazet
On Fri, Feb 2, 2024 at 5:55 PM Breno Leitao  wrote:
>
> From: Jakub Kicinski 
>
> softnet_data->time_squeeze is sometimes used as a proxy for
> host overload or indication of scheduling problems. In practice
> this statistic is very noisy and has hard to grasp units -
> e.g. is 10 squeezes a second to be expected, or high?
>
> Delaying network (NAPI) processing leads to drops on NIC queues
> but also RTT bloat, impacting pacing and CA decisions.
> Stalls are a little hard to detect on the Rx side, because
> there may simply have not been any packets received in given
> period of time. Packet timestamps help a little bit, but
> again we don't know if packets are stale because we're
> not keeping up or because someone (*cough* cgroups)
> disabled IRQs for a long time.

Please note that adding other sysfs entries is expensive for workloads
creating/deleting netdev and netns often.

I _think_ we should find a way for not creating
/sys/class/net//queues/tx-{Q}/byte_queue_limits  directory
and files
for non BQL enabled devices (like loopback !)



[PATCH v1] virtio: reenable config if freezing device failed

2024-02-13 Thread David Hildenbrand
Currently, we don't reenable the config if freezing the device failed.

For example, virtio-mem currently doesn't support suspend+resume, and
trying to freeze the device will always fail. Afterwards, the device
will no longer respond to resize requests, because it won't get notified
about config changes.

Let's fix this by re-enabling the config if freezing fails.

Fixes: 22b7050a024d ("virtio: defer config changed notifications")
Cc: 
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Xuan Zhuo 
Signed-off-by: David Hildenbrand 
---
 drivers/virtio/virtio.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index f4080692b351..f513ee21b1c1 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -510,8 +510,10 @@ int virtio_device_freeze(struct virtio_device *dev)
 
if (drv && drv->freeze) {
ret = drv->freeze(dev);
-   if (ret)
+   if (ret) {
+   virtio_config_enable(dev);
return ret;
+   }
}
 
if (dev->config->destroy_avq)

base-commit: c664e16bb1ba1c8cf1d7ecf3df5fd83bbb8ac15a
-- 
2.43.0




Re: [PATCH net-next v3] net: dqs: add NIC stall detector based on BQL

2024-02-13 Thread Breno Leitao
On Tue, Feb 06, 2024 at 12:40:13PM +0100, Paolo Abeni wrote:
> On Fri, 2024-02-02 at 08:53 -0800, Breno Leitao wrote:
> > From: Jakub Kicinski 
> > 
> > softnet_data->time_squeeze is sometimes used as a proxy for
> > host overload or indication of scheduling problems. In practice
> > this statistic is very noisy and has hard to grasp units -
> > e.g. is 10 squeezes a second to be expected, or high?
> > 
> > Delaying network (NAPI) processing leads to drops on NIC queues
> > but also RTT bloat, impacting pacing and CA decisions.
> > Stalls are a little hard to detect on the Rx side, because
> > there may simply have not been any packets received in given
> > period of time. Packet timestamps help a little bit, but
> > again we don't know if packets are stale because we're
> > not keeping up or because someone (*cough* cgroups)
> > disabled IRQs for a long time.
> > 
> > We can, however, use Tx as a proxy for Rx stalls. Most drivers
> > use combined Rx+Tx NAPIs so if Tx gets starved so will Rx.
> > On the Tx side we know exactly when packets get queued,
> > and completed, so there is no uncertainty.
> > 
> > This patch adds stall checks to BQL. Why BQL? Because
> > it's a convenient place to add such checks, already
> > called by most drivers, and it has copious free space
> > in its structures (this patch adds no extra cache
> > references or dirtying to the fast path).
> > 
> > The algorithm takes one parameter - max delay AKA stall
> > threshold and increments a counter whenever NAPI got delayed
> > for at least that amount of time. It also records the length
> > of the longest stall.
> > 
> > To be precise every time NAPI has not polled for at least
> > stall thrs we check if there were any Tx packets queued
> > between last NAPI run and now - stall_thrs/2.
> > 
> > Unlike the classic Tx watchdog this mechanism does not
> > ignore stalls caused by Tx being disabled, or loss of link.
> > I don't think the check is worth the complexity, and
> > stall is a stall, whether due to host overload, flow
> > control, link down... doesn't matter much to the application.
> > 
> > We have been running this detector in production at Meta
> > for 2 years, with the threshold of 8ms. It's the lowest
> > value where false positives become rare. There's still
> > a constant stream of reported stalls (especially without
> > the ksoftirqd deferral patches reverted), those who like
> > their stall metrics to be 0 may prefer higher value.
> > 
> > Signed-off-by: Jakub Kicinski 
> > Signed-off-by: Breno Leitao 
> > ---
> > Changelog:
> > 
> > v1:
> >   * https://lore.kernel.org/netdev/202306172057.jx7yhliu-...@intel.com/T/
> > 
> > v2:
> >   * Fix the documentation file in patch 0001, since patch 0002 will
> > touch it later.
> >   * Fix the kernel test robot issues, marking functions as statics.
> >   * Use #include  instead of .
> >   * Added some new comments around, mainly around barriers.
> >   * Format struct `netdev_queue_attribute` assignments to make
> > checkpatch happy.
> >   * Updated and fixed the path in sysfs-class-net-queues
> > documentation.
> > 
> > v3:
> >   * Sending patch 0002 against net-next.
> > - The first patch was accepted against 'net'
> > 
> > ---
> >  .../ABI/testing/sysfs-class-net-queues| 23 +++
> >  include/linux/dynamic_queue_limits.h  | 35 +++
> >  include/trace/events/napi.h   | 33 ++
> >  lib/dynamic_queue_limits.c| 58 +
> >  net/core/net-sysfs.c  | 62 +++
> >  5 files changed, 211 insertions(+)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-class-net-queues 
> > b/Documentation/ABI/testing/sysfs-class-net-queues
> > index 5bff64d256c2..45bab9b6e3ae 100644
> > --- a/Documentation/ABI/testing/sysfs-class-net-queues
> > +++ b/Documentation/ABI/testing/sysfs-class-net-queues
> > @@ -96,3 +96,26 @@ Description:
> > Indicates the absolute minimum limit of bytes allowed to be
> > queued on this network device transmit queue. Default value is
> > 0.
> > +
> > +What:  
> > /sys/class/net//queues/tx-/byte_queue_limits/stall_thrs
> > +Date:  Jan 2024
> > +KernelVersion: 6.9
> > +Contact:   net...@vger.kernel.org
> > +Description:
> > +   Tx completion stall detection threshold. 
> 
> Possibly worth mentioning it's in milliseconds
> 
> > Kernel will guarantee
> > +   to detect all stalls longer than this threshold but may also
> > +   detect stalls longer than half of the threshold.
> > +
> > +What:  
> > /sys/class/net//queues/tx-/byte_queue_limits/stall_cnt
> > +Date:  Jan 2024
> > +KernelVersion: 6.9
> > +Contact:   net...@vger.kernel.org
> > +Description:
> > +   Number of detected Tx completion stalls.
> > +
> > +What:  
> > /sys/class/net//queues/tx-/byte_queue_limits/stall_max
> > +Date:  Jan 2024
> >

[PATCH] tracing: Fix HAVE_DYNAMIC_FTRACE_WITH_REGS ifdef

2024-02-13 Thread Petr Pavlu
Commit a8b9cf62ade1 ("ftrace: Fix DIRECT_CALLS to use SAVE_REGS by
default") attempted to fix an issue with direct trampolines on x86, see
its description for details. However, it wrongly referenced the
HAVE_DYNAMIC_FTRACE_WITH_REGS config option and the problem is still
present.

Add the missing "CONFIG_" prefix for the logic to work as intended.

Fixes: a8b9cf62ade1 ("ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default")
Signed-off-by: Petr Pavlu 
---
 kernel/trace/ftrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index c060d5b47910..83ba342aef31 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -5331,7 +5331,7 @@ static int register_ftrace_function_nolock(struct 
ftrace_ops *ops);
  * not support ftrace_regs_caller but direct_call, use SAVE_ARGS so that it
  * jumps from ftrace_caller for multiple ftrace_ops.
  */
-#ifndef HAVE_DYNAMIC_FTRACE_WITH_REGS
+#ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS
 #define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_ARGS)
 #else
 #define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS)

base-commit: 841c35169323cd833294798e58b9bf63fa4fa1de
-- 
2.35.3




Re: [PATCH v2 0/3] Add RPMPD support for MSM8974

2024-02-13 Thread Ulf Hansson
On Sat, 10 Feb 2024 at 17:39, Luca Weiss  wrote:
>
> Add driver support for the RPM power domains found on the different
> MSM8974 devices.
>
> Devicetree integration will come at a later point since also some
> mostly remoteproc drivers need to be adjusted.
>
> Also the MX power domains on this SoC seems to work quite a bit
> differently, we'd need to send raw voltages to it, so these are ignored
> in this series.
>
> Signed-off-by: Luca Weiss 

Applied for next, thanks!

Note that patch1 is also available at the immutable dt branch, if soc
maintainers need to pull it in too.

Kind regards
Uffe


> ---
> Changes in v2:
> - Drop MSM8974_VDDGFX_AO in all patches
> - Link to v1: 
> https://lore.kernel.org/r/20240210-msm8974-rpmpd-v1-0-de9355e68...@z3ntu.xyz
>
> ---
> Luca Weiss (3):
>   dt-bindings: power: rpmpd: Add MSM8974 power domains
>   pmdomain: qcom: rpmpd: Add MSM8974+PM8841 power domains
>   pmdomain: qcom: rpmpd: Add MSM8974PRO+PMA8084 power domains
>
>  .../devicetree/bindings/power/qcom,rpmpd.yaml  |  2 +
>  drivers/pmdomain/qcom/rpmpd.c  | 83 
> ++
>  include/dt-bindings/power/qcom-rpmpd.h |  7 ++
>  3 files changed, 92 insertions(+)
> ---
> base-commit: 6e3fa474051f3d276ea708bdb8e8e1f66d1d3ee5
> change-id: 20240210-msm8974-rpmpd-6e48fe374275
>
> Best regards,
> --
> Luca Weiss 
>



[PATCH v17 5/6] Documentation: tracing: Add ring-buffer mapping

2024-02-13 Thread Vincent Donnefort
It is now possible to mmap() a ring-buffer to stream its content. Add
some documentation and a code example.

Signed-off-by: Vincent Donnefort 

diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index 5092d6c13af5..0b300901fd75 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -29,6 +29,7 @@ Linux Tracing Technologies
timerlat-tracer
intel_th
ring-buffer-design
+   ring-buffer-map
stm
sys-t
coresight/index
diff --git a/Documentation/trace/ring-buffer-map.rst 
b/Documentation/trace/ring-buffer-map.rst
new file mode 100644
index ..0426ab4bcf3d
--- /dev/null
+++ b/Documentation/trace/ring-buffer-map.rst
@@ -0,0 +1,106 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==
+Tracefs ring-buffer memory mapping
+==
+
+:Author: Vincent Donnefort 
+
+Overview
+
+Tracefs ring-buffer memory map provides an efficient method to stream data
+as no memory copy is necessary. The application mapping the ring-buffer becomes
+then a consumer for that ring-buffer, in a similar fashion to trace_pipe.
+
+Memory mapping setup
+
+The mapping works with a mmap() of the trace_pipe_raw interface.
+
+The first system page of the mapping contains ring-buffer statistics and
+description. It is referred as the meta-page. One of the most important field 
of
+the meta-page is the reader. It contains the sub-buffer ID which can be safely
+read by the mapper (see ring-buffer-design.rst).
+
+The meta-page is followed by all the sub-buffers, ordered by ascendant ID. It 
is
+therefore effortless to know where the reader starts in the mapping:
+
+.. code-block:: c
+
+reader_id = meta->reader->id;
+reader_offset = meta->meta_page_size + reader_id * meta->subbuf_size;
+
+When the application is done with the current reader, it can get a new one 
using
+the trace_pipe_raw ioctl() TRACE_MMAP_IOCTL_GET_READER. This ioctl also updates
+the meta-page fields.
+
+Limitations
+===
+When a mapping is in place on a Tracefs ring-buffer, it is not possible to
+either resize it (either by increasing the entire size of the ring-buffer or
+each subbuf). It is also not possible to use snapshot and causes splice to copy
+the ring buffer data instead of using the copyless swap from the ring buffer.
+
+Concurrent readers (either another application mapping that ring-buffer or the
+kernel with trace_pipe) are allowed but not recommended. They will compete for
+the ring-buffer and the output is unpredictable, just like concurrent readers 
on
+trace_pipe would be.
+
+Example
+===
+
+.. code-block:: c
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+#include 
+
+#define TRACE_PIPE_RAW 
"/sys/kernel/tracing/per_cpu/cpu0/trace_pipe_raw"
+
+int main(void)
+{
+int page_size = getpagesize(), fd, reader_id;
+unsigned long meta_len, data_len;
+struct trace_buffer_meta *meta;
+void *map, *reader, *data;
+
+fd = open(TRACE_PIPE_RAW, O_RDONLY | O_NONBLOCK);
+if (fd < 0)
+exit(EXIT_FAILURE);
+
+map = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0);
+if (map == MAP_FAILED)
+exit(EXIT_FAILURE);
+
+meta = (struct trace_buffer_meta *)map;
+meta_len = meta->meta_page_size;
+
+printf("entries:%llu\n", meta->entries);
+printf("overrun:%llu\n", meta->overrun);
+printf("read:   %llu\n", meta->read);
+printf("nr_subbufs: %u\n", meta->nr_subbufs);
+
+data_len = meta->subbuf_size * meta->nr_subbufs;
+data = mmap(NULL, data_len, PROT_READ, MAP_SHARED, fd, 
meta_len);
+if (data == MAP_FAILED)
+exit(EXIT_FAILURE);
+
+if (ioctl(fd, TRACE_MMAP_IOCTL_GET_READER) < 0)
+exit(EXIT_FAILURE);
+
+reader_id = meta->reader.id;
+reader = data + meta->subbuf_size * reader_id;
+
+printf("Current reader address: %p\n", reader);
+
+munmap(data, data_len);
+munmap(meta, meta_len);
+close (fd);
+
+return 0;
+}
-- 
2.43.0.687.g38aa6559b0-goog




[PATCH v17 4/6] tracing: Allow user-space mapping of the ring-buffer

2024-02-13 Thread Vincent Donnefort
Currently, user-space extracts data from the ring-buffer via splice,
which is handy for storage or network sharing. However, due to splice
limitations, it is imposible to do real-time analysis without a copy.

A solution for that problem is to let the user-space map the ring-buffer
directly.

The mapping is exposed via the per-CPU file trace_pipe_raw. The first
element of the mapping is the meta-page. It is followed by each
subbuffer constituting the ring-buffer, ordered by their unique page ID:

  * Meta-page -- include/uapi/linux/trace_mmap.h for a description
  * Subbuf ID 0
  * Subbuf ID 1
 ...

It is therefore easy to translate a subbuf ID into an offset in the
mapping:

  reader_id = meta->reader->id;
  reader_offset = meta->meta_page_size + reader_id * meta->subbuf_size;

When new data is available, the mapper must call a newly introduced ioctl:
TRACE_MMAP_IOCTL_GET_READER. This will update the Meta-page reader ID to
point to the next reader containing unread data.

Mapping will prevent snapshot and buffer size modifications.

Signed-off-by: Vincent Donnefort 

diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h
index ffcd8dfcaa4f..d25b9d504a7c 100644
--- a/include/uapi/linux/trace_mmap.h
+++ b/include/uapi/linux/trace_mmap.h
@@ -43,4 +43,6 @@ struct trace_buffer_meta {
__u64   Reserved2;
 };
 
+#define TRACE_MMAP_IOCTL_GET_READER_IO('T', 0x1)
+
 #endif /* _TRACE_MMAP_H_ */
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4ebf4d0bd14c..c9e513a95cb7 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1175,6 +1175,12 @@ static void tracing_snapshot_instance_cond(struct 
trace_array *tr,
return;
}
 
+   if (tr->mapped) {
+   trace_array_puts(tr, "*** BUFFER MEMORY MAPPED ***\n");
+   trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n");
+   return;
+   }
+
local_irq_save(flags);
update_max_tr(tr, current, smp_processor_id(), cond_data);
local_irq_restore(flags);
@@ -1307,7 +1313,7 @@ static int tracing_arm_snapshot_locked(struct trace_array 
*tr)
lockdep_assert_held(&trace_types_lock);
 
spin_lock(&tr->snapshot_trigger_lock);
-   if (tr->snapshot == UINT_MAX) {
+   if (tr->snapshot == UINT_MAX || tr->mapped) {
spin_unlock(&tr->snapshot_trigger_lock);
return -EBUSY;
}
@@ -6533,7 +6539,7 @@ static void tracing_set_nop(struct trace_array *tr)
 {
if (tr->current_trace == &nop_trace)
return;
-   
+
tr->current_trace->enabled--;
 
if (tr->current_trace->reset)
@@ -8652,15 +8658,31 @@ tracing_buffers_splice_read(struct file *file, loff_t 
*ppos,
return ret;
 }
 
-/* An ioctl call with cmd 0 to the ring buffer file will wake up all waiters */
 static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
 {
struct ftrace_buffer_info *info = file->private_data;
struct trace_iterator *iter = &info->iter;
+   int err;
+
+   if (cmd == TRACE_MMAP_IOCTL_GET_READER) {
+   if (!(file->f_flags & O_NONBLOCK)) {
+   err = ring_buffer_wait(iter->array_buffer->buffer,
+  iter->cpu_file,
+  iter->tr->buffer_percent);
+   if (err)
+   return err;
+   }
 
-   if (cmd)
-   return -ENOIOCTLCMD;
+   return ring_buffer_map_get_reader(iter->array_buffer->buffer,
+ iter->cpu_file);
+   } else if (cmd) {
+   return -ENOTTY;
+   }
 
+   /*
+* An ioctl call with cmd 0 to the ring buffer file will wake up all
+* waiters
+*/
mutex_lock(&trace_types_lock);
 
iter->wait_index++;
@@ -8673,6 +8695,110 @@ static long tracing_buffers_ioctl(struct file *file, 
unsigned int cmd, unsigned
return 0;
 }
 
+static vm_fault_t tracing_buffers_mmap_fault(struct vm_fault *vmf)
+{
+   struct ftrace_buffer_info *info = vmf->vma->vm_file->private_data;
+   struct trace_iterator *iter = &info->iter;
+   vm_fault_t ret = VM_FAULT_SIGBUS;
+   struct page *page;
+
+   page = ring_buffer_map_fault(iter->array_buffer->buffer, iter->cpu_file,
+vmf->pgoff);
+   if (!page)
+   return ret;
+
+   get_page(page);
+   vmf->page = page;
+   vmf->page->mapping = vmf->vma->vm_file->f_mapping;
+   vmf->page->index = vmf->pgoff;
+
+   return 0;
+}
+
+#ifdef CONFIG_TRACER_MAX_TRACE
+static int get_snapshot_map(struct trace_array *tr)
+{
+   int err = 0;
+
+   /*
+* Called with mmap_lock held. lockdep would be unhappy if we would now
+* take trace_types_lock. Instead use the specific
+* s

[PATCH v17 3/6] tracing: Add snapshot refcount

2024-02-13 Thread Vincent Donnefort
When a ring-buffer is memory mapped by user-space, no trace or
ring-buffer swap is possible. This means the snapshot feature is
mutually exclusive with the memory mapping. Having a refcount on
snapshot users will help to know if a mapping is possible or not.

Instead of relying on the global trace_types_lock, a new spinlock is
introduced to serialize accesses to trace_array->snapshot. This intends
to allow access to that variable in a context where the mmap lock is
already held.

Signed-off-by: Vincent Donnefort 

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2a7c6fd934e9..4ebf4d0bd14c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1300,6 +1300,50 @@ static void free_snapshot(struct trace_array *tr)
tr->allocated_snapshot = false;
 }
 
+static int tracing_arm_snapshot_locked(struct trace_array *tr)
+{
+   int ret;
+
+   lockdep_assert_held(&trace_types_lock);
+
+   spin_lock(&tr->snapshot_trigger_lock);
+   if (tr->snapshot == UINT_MAX) {
+   spin_unlock(&tr->snapshot_trigger_lock);
+   return -EBUSY;
+   }
+
+   tr->snapshot++;
+   spin_unlock(&tr->snapshot_trigger_lock);
+
+   ret = tracing_alloc_snapshot_instance(tr);
+   if (ret) {
+   spin_lock(&tr->snapshot_trigger_lock);
+   tr->snapshot--;
+   spin_unlock(&tr->snapshot_trigger_lock);
+   }
+
+   return ret;
+}
+
+int tracing_arm_snapshot(struct trace_array *tr)
+{
+   int ret;
+
+   mutex_lock(&trace_types_lock);
+   ret = tracing_arm_snapshot_locked(tr);
+   mutex_unlock(&trace_types_lock);
+
+   return ret;
+}
+
+void tracing_disarm_snapshot(struct trace_array *tr)
+{
+   spin_lock(&tr->snapshot_trigger_lock);
+   if (!WARN_ON(!tr->snapshot))
+   tr->snapshot--;
+   spin_unlock(&tr->snapshot_trigger_lock);
+}
+
 /**
  * tracing_alloc_snapshot - allocate snapshot buffer.
  *
@@ -1373,10 +1417,6 @@ int tracing_snapshot_cond_enable(struct trace_array *tr, 
void *cond_data,
 
mutex_lock(&trace_types_lock);
 
-   ret = tracing_alloc_snapshot_instance(tr);
-   if (ret)
-   goto fail_unlock;
-
if (tr->current_trace->use_max_tr) {
ret = -EBUSY;
goto fail_unlock;
@@ -1395,6 +1435,10 @@ int tracing_snapshot_cond_enable(struct trace_array *tr, 
void *cond_data,
goto fail_unlock;
}
 
+   ret = tracing_arm_snapshot_locked(tr);
+   if (ret)
+   goto fail_unlock;
+
local_irq_disable();
arch_spin_lock(&tr->max_lock);
tr->cond_snapshot = cond_snapshot;
@@ -1439,6 +1483,8 @@ int tracing_snapshot_cond_disable(struct trace_array *tr)
arch_spin_unlock(&tr->max_lock);
local_irq_enable();
 
+   tracing_disarm_snapshot(tr);
+
return ret;
 }
 EXPORT_SYMBOL_GPL(tracing_snapshot_cond_disable);
@@ -1481,6 +1527,7 @@ int tracing_snapshot_cond_disable(struct trace_array *tr)
 }
 EXPORT_SYMBOL_GPL(tracing_snapshot_cond_disable);
 #define free_snapshot(tr)  do { } while (0)
+#define tracing_arm_snapshot_locked(tr) ({ -EBUSY; })
 #endif /* CONFIG_TRACER_SNAPSHOT */
 
 void tracer_tracing_off(struct trace_array *tr)
@@ -6593,11 +6640,12 @@ int tracing_set_tracer(struct trace_array *tr, const 
char *buf)
 */
synchronize_rcu();
free_snapshot(tr);
+   tracing_disarm_snapshot(tr);
}
 
-   if (t->use_max_tr && !tr->allocated_snapshot) {
-   ret = tracing_alloc_snapshot_instance(tr);
-   if (ret < 0)
+   if (t->use_max_tr) {
+   ret = tracing_arm_snapshot_locked(tr);
+   if (ret)
goto out;
}
 #else
@@ -6606,8 +6654,13 @@ int tracing_set_tracer(struct trace_array *tr, const 
char *buf)
 
if (t->init) {
ret = tracer_init(t, tr);
-   if (ret)
+   if (ret) {
+#ifdef CONFIG_TRACER_MAX_TRACE
+   if (t->use_max_tr)
+   tracing_disarm_snapshot(tr);
+#endif
goto out;
+   }
}
 
tr->current_trace = t;
@@ -7709,10 +7762,11 @@ tracing_snapshot_write(struct file *filp, const char 
__user *ubuf, size_t cnt,
if (tr->allocated_snapshot)
ret = resize_buffer_duplicate_size(&tr->max_buffer,
&tr->array_buffer, iter->cpu_file);
-   else
-   ret = tracing_alloc_snapshot_instance(tr);
-   if (ret < 0)
+
+   ret = tracing_arm_snapshot_locked(tr);
+   if (ret)
break;
+
/* Now, we're going to swap */
if (iter->cpu_file == RING_BUFFER_ALL_CPUS) {
local_irq_disable();
@@ -7722,6 +7776,7 @@ tracing_snapshot_write(struct file *filp, const cha

[PATCH v17 2/6] ring-buffer: Introducing ring-buffer mapping functions

2024-02-13 Thread Vincent Donnefort
In preparation for allowing the user-space to map a ring-buffer, add
a set of mapping functions:

  ring_buffer_{map,unmap}()
  ring_buffer_map_fault()

And controls on the ring-buffer:

  ring_buffer_map_get_reader()  /* swap reader and head */

Mapping the ring-buffer also involves:

  A unique ID for each subbuf of the ring-buffer, currently they are
  only identified through their in-kernel VA.

  A meta-page, where are stored ring-buffer statistics and a
  description for the current reader

The linear mapping exposes the meta-page, and each subbuf of the
ring-buffer, ordered following their unique ID, assigned during the
first mapping.

Once mapped, no subbuf can get in or out of the ring-buffer: the buffer
size will remain unmodified and the splice enabling functions will in
reality simply memcpy the data instead of swapping subbufs.

Signed-off-by: Vincent Donnefort 

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index fa802db216f9..0841ba8bab14 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -6,6 +6,8 @@
 #include 
 #include 
 
+#include 
+
 struct trace_buffer;
 struct ring_buffer_iter;
 
@@ -221,4 +223,9 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct 
hlist_node *node);
 #define trace_rb_cpu_prepare   NULL
 #endif
 
+int ring_buffer_map(struct trace_buffer *buffer, int cpu);
+int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
+struct page *ring_buffer_map_fault(struct trace_buffer *buffer, int cpu,
+  unsigned long pgoff);
+int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);
 #endif /* _LINUX_RING_BUFFER_H */
diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h
new file mode 100644
index ..ffcd8dfcaa4f
--- /dev/null
+++ b/include/uapi/linux/trace_mmap.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _TRACE_MMAP_H_
+#define _TRACE_MMAP_H_
+
+#include 
+
+/**
+ * struct trace_buffer_meta - Ring-buffer Meta-page description
+ * @meta_page_size:Size of this meta-page.
+ * @meta_struct_len:   Size of this structure.
+ * @subbuf_size:   Size of each sub-buffer.
+ * @nr_subbufs:Number of subbfs in the ring-buffer, including 
the reader.
+ * @reader.lost_events:Number of events lost at the time of the reader 
swap.
+ * @reader.id: subbuf ID of the current reader. ID range [0 : 
@nr_subbufs - 1]
+ * @reader.read:   Number of bytes read on the reader subbuf.
+ * @flags: Placeholder for now, 0 until new features are supported.
+ * @entries:   Number of entries in the ring-buffer.
+ * @overrun:   Number of entries lost in the ring-buffer.
+ * @read:  Number of entries that have been read.
+ * @Reserved1: Reserved for future use.
+ * @Reserved2: Reserved for future use.
+ */
+struct trace_buffer_meta {
+   __u32   meta_page_size;
+   __u32   meta_struct_len;
+
+   __u32   subbuf_size;
+   __u32   nr_subbufs;
+
+   struct {
+   __u64   lost_events;
+   __u32   id;
+   __u32   read;
+   } reader;
+
+   __u64   flags;
+
+   __u64   entries;
+   __u64   overrun;
+   __u64   read;
+
+   __u64   Reserved1;
+   __u64   Reserved2;
+};
+
+#endif /* _TRACE_MMAP_H_ */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index ca796675c0a1..8d4f4b88cba2 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -338,6 +339,7 @@ struct buffer_page {
local_t  entries;   /* entries on this page */
unsigned longreal_end;  /* real end of data */
unsigned order; /* order of the page */
+   u32  id;/* ID for external mapping */
struct buffer_data_page *page;  /* Actual data page */
 };
 
@@ -484,6 +486,12 @@ struct ring_buffer_per_cpu {
u64 read_stamp;
/* pages removed since last reset */
unsigned long   pages_removed;
+
+   unsigned intmapped;
+   struct mutexmapping_lock;
+   unsigned long   *subbuf_ids;/* ID to subbuf addr */
+   struct trace_buffer_meta*meta_page;
+
/* ring buffer pages to update, > 0 to add, < 0 to remove */
longnr_pages_to_update;
struct list_headnew_pages; /* new pages to add */
@@ -1548,6 +1556,7 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long 
nr_pages, int cpu)
init_irq_work(&cpu_buffer->irq_work.work, rb_wake_up_waiters);
init_waitqueue_head(&cpu_buffer->irq_work.waiters);
init_waitqueue_head(&cpu_buffer->irq_work.full_w

[PATCH v17 1/6] ring-buffer: Zero ring-buffer sub-buffers

2024-02-13 Thread Vincent Donnefort
In preparation for the ring-buffer memory mapping where each subbuf will
be accessible to user-space, zero all the page allocations.

Signed-off-by: Vincent Donnefort 
Reviewed-by: Masami Hiramatsu (Google) 

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index fd4bfe3ecf01..ca796675c0a1 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1472,7 +1472,8 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu 
*cpu_buffer,
 
list_add(&bpage->list, pages);
 
-   page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu), mflags,
+   page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu),
+   mflags | __GFP_ZERO,
cpu_buffer->buffer->subbuf_order);
if (!page)
goto free_pages;
@@ -1557,7 +1558,8 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long 
nr_pages, int cpu)
 
cpu_buffer->reader_page = bpage;
 
-   page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL, 
cpu_buffer->buffer->subbuf_order);
+   page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL | __GFP_ZERO,
+   cpu_buffer->buffer->subbuf_order);
if (!page)
goto fail_free_reader;
bpage->page = page_address(page);
@@ -5525,7 +5527,8 @@ ring_buffer_alloc_read_page(struct trace_buffer *buffer, 
int cpu)
if (bpage->data)
goto out;
 
-   page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL | __GFP_NORETRY,
+   page = alloc_pages_node(cpu_to_node(cpu),
+   GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO,
cpu_buffer->buffer->subbuf_order);
if (!page) {
kfree(bpage);
-- 
2.43.0.687.g38aa6559b0-goog




[PATCH v17 0/6] Introducing trace buffer mapping by user-space

2024-02-13 Thread Vincent Donnefort
The tracing ring-buffers can be stored on disk or sent to network
without any copy via splice. However the later doesn't allow real time
processing of the traces. A solution is to give userspace direct access
to the ring-buffer pages via a mapping. An application can now become a
consumer of the ring-buffer, in a similar fashion to what trace_pipe
offers.

Support for this new feature can already be found in libtracefs from
version 1.8, when built with EXTRA_CFLAGS=-DFORCE_MMAP_ENABLE.

Vincent

v16 -> v17:
  * Documentation and comments improvements.
  * Create get/put_snapshot_map() for clearer code.
  * Replace kzalloc with kcalloc.
  * Fix -ENOMEM handling in rb_alloc_meta_page().
  * Move flush(cpu_buffer->reader_page) behind the reader lock.
  * Move all inc/dec of cpu_buffer->mapped behind reader lock and buffer
mutex. (removes READ_ONCE/WRITE_ONCE accesses).

v15 -> v16:
  * Add comment for the dcache flush.
  * Remove now unnecessary WRITE_ONCE for the meta-page.

v14 -> v15:
  * Add meta-page and reader-page flush. Intends to fix the mapping
for VIVT and aliasing-VIPT data caches.
  * -EPERM on VM_EXEC.
  * Fix build warning !CONFIG_TRACER_MAX_TRACE.

v13 -> v14:
  * All cpu_buffer->mapped readers use READ_ONCE (except for swap_cpu)
  * on unmap, sync meta-page teardown with the reader_lock instead of
the synchronize_rcu.
  * Add a dedicated spinlock for trace_array ->snapshot and ->mapped.
(intends to fix a lockdep issue)
  * Add kerneldoc for flags and Reserved fields.
  * Add kselftest for snapshot/map mutual exclusion.

v12 -> v13:
  * Swap subbufs_{touched,lost} for Reserved fields.
  * Add a flag field in the meta-page.
  * Fix CONFIG_TRACER_MAX_TRACE.
  * Rebase on top of trace/urgent.
  * Add a comment for try_unregister_trigger()

v11 -> v12:
  * Fix code sample mmap bug.
  * Add logging in sample code.
  * Reset tracer in selftest.
  * Add a refcount for the snapshot users.
  * Prevent mapping when there are snapshot users and vice versa.
  * Refine the meta-page.
  * Fix types in the meta-page.
  * Collect Reviewed-by.

v10 -> v11:
  * Add Documentation and code sample.
  * Add a selftest.
  * Move all the update to the meta-page into a single
rb_update_meta_page().
  * rb_update_meta_page() is now called from
ring_buffer_map_get_reader() to fix NOBLOCK callers.
  * kerneldoc for struct trace_meta_page.
  * Add a patch to zero all the ring-buffer allocations.

v9 -> v10:
  * Refactor rb_update_meta_page()
  * In-loop declaration for foreach_subbuf_page()
  * Check for cpu_buffer->mapped overflow

v8 -> v9:
  * Fix the unlock path in ring_buffer_map()
  * Fix cpu_buffer cast with rb_work_rq->is_cpu_buffer
  * Rebase on linux-trace/for-next (3cb3091138ca0921c4569bcf7ffa062519639b6a)

v7 -> v8:
  * Drop the subbufs renaming into bpages
  * Use subbuf as a name when relevant

v6 -> v7:
  * Rebase onto lore.kernel.org/lkml/20231215175502.106587...@goodmis.org/
  * Support for subbufs
  * Rename subbufs into bpages

v5 -> v6:
  * Rebase on next-20230802.
  * (unsigned long) -> (void *) cast for virt_to_page().
  * Add a wait for the GET_READER_PAGE ioctl.
  * Move writer fields update (overrun/pages_lost/entries/pages_touched)
in the irq_work.
  * Rearrange id in struct buffer_page.
  * Rearrange the meta-page.
  * ring_buffer_meta_page -> trace_buffer_meta_page.
  * Add meta_struct_len into the meta-page.

v4 -> v5:
  * Trivial rebase onto 6.5-rc3 (previously 6.4-rc3)

v3 -> v4:
  * Add to the meta-page:
   - pages_lost / pages_read (allow to compute how full is the
 ring-buffer)
   - read (allow to compute how many entries can be read)
   - A reader_page struct.
  * Rename ring_buffer_meta_header -> ring_buffer_meta
  * Rename ring_buffer_get_reader_page -> ring_buffer_map_get_reader_page
  * Properly consume events on ring_buffer_map_get_reader_page() with
rb_advance_reader().

v2 -> v3:
  * Remove data page list (for non-consuming read)
** Implies removing order > 0 meta-page
  * Add a new meta page field ->read
  * Rename ring_buffer_meta_page_header into ring_buffer_meta_header

v1 -> v2:
  * Hide data_pages from the userspace struct
  * Fix META_PAGE_MAX_PAGES
  * Support for order > 0 meta-page
  * Add missing page->mapping.

Vincent Donnefort (6):
  ring-buffer: Zero ring-buffer sub-buffers
  ring-buffer: Introducing ring-buffer mapping functions
  tracing: Add snapshot refcount
  tracing: Allow user-space mapping of the ring-buffer
  Documentation: tracing: Add ring-buffer mapping
  ring-buffer/selftest: Add ring-buffer mapping test

 Documentation/trace/index.rst |   1 +
 Documentation/trace/ring-buffer-map.rst   | 106 +
 include/linux/ring_buffer.h   |   7 +
 include/uapi/linux/trace_mmap.h   |  48 +++
 kernel/trace/ring_buffer.c| 385 +-
 kernel/trace/trace.c  | 234 ++-
 kernel/trace/trace.h

Re: [PATCH] ALSA: virtio: Fix "Coverity: virtsnd_kctl_tlv_op(): Uninitialized variables" warning.

2024-02-13 Thread Takashi Iwai
On Tue, 13 Feb 2024 10:07:35 +0100,
Michael S. Tsirkin wrote:
> 
> On Tue, Feb 13, 2024 at 10:02:24AM +0100, Takashi Iwai wrote:
> > On Tue, 13 Feb 2024 09:51:30 +0100,
> > Aiswarya Cyriac wrote:
> > > 
> > > Fix the following warning when building virtio_snd driver.
> > > 
> > > "
> > > *** CID 1583619:  Uninitialized variables  (UNINIT)
> > > sound/virtio/virtio_kctl.c:294 in virtsnd_kctl_tlv_op()
> > > 288
> > > 289 break;
> > > 290   }
> > > 291
> > > 292   kfree(tlv);
> > > 293
> > > vvv CID 1583619:  Uninitialized variables  (UNINIT)
> > > vvv Using uninitialized value "rc".
> > > 294   return rc;
> > > 295 }
> > > 296
> > > 297 /**
> > > 298  * virtsnd_kctl_get_enum_items() - Query items for the ENUMERATED 
> > > element type.
> > > 299  * @snd: VirtIO sound device.
> > > "
> > > 
> > > Signed-off-by: Anton Yakovlev 
> > > Signed-off-by: Aiswarya Cyriac 
> > > Reported-by: coverity-bot 
> > > Addresses-Coverity-ID: 1583619 ("Uninitialized variables")
> > > Fixes: d6568e3de42d ("ALSA: virtio: add support for audio controls")
> > 
> > Thanks, applied.
> > 
> > 
> > Takashi
> 
> Why did you apply it directly? The patch isn't great IMHO.
> Why not give people a couple of days to review?

Sure, we can wait.  I applied it quickly just since it's an obvious
fix and you haven't responded to the original patch over a month.


thanks,

Takashi



Re: [PATCH] ALSA: virtio: Fix "Coverity: virtsnd_kctl_tlv_op(): Uninitialized variables" warning.

2024-02-13 Thread Michael S. Tsirkin
On Tue, Feb 13, 2024 at 10:02:24AM +0100, Takashi Iwai wrote:
> On Tue, 13 Feb 2024 09:51:30 +0100,
> Aiswarya Cyriac wrote:
> > 
> > Fix the following warning when building virtio_snd driver.
> > 
> > "
> > *** CID 1583619:  Uninitialized variables  (UNINIT)
> > sound/virtio/virtio_kctl.c:294 in virtsnd_kctl_tlv_op()
> > 288
> > 289 break;
> > 290   }
> > 291
> > 292   kfree(tlv);
> > 293
> > vvv CID 1583619:  Uninitialized variables  (UNINIT)
> > vvv Using uninitialized value "rc".
> > 294   return rc;
> > 295 }
> > 296
> > 297 /**
> > 298  * virtsnd_kctl_get_enum_items() - Query items for the ENUMERATED 
> > element type.
> > 299  * @snd: VirtIO sound device.
> > "
> > 
> > Signed-off-by: Anton Yakovlev 
> > Signed-off-by: Aiswarya Cyriac 
> > Reported-by: coverity-bot 
> > Addresses-Coverity-ID: 1583619 ("Uninitialized variables")
> > Fixes: d6568e3de42d ("ALSA: virtio: add support for audio controls")
> 
> Thanks, applied.
> 
> 
> Takashi

Why did you apply it directly? The patch isn't great IMHO.
Why not give people a couple of days to review?

-- 
MST




Re: [PATCH] ALSA: virtio: Fix "Coverity: virtsnd_kctl_tlv_op(): Uninitialized variables" warning.

2024-02-13 Thread Michael S. Tsirkin
On Tue, Feb 13, 2024 at 09:51:30AM +0100, Aiswarya Cyriac wrote:
> Fix the following warning when building virtio_snd driver.
> 
> "
> *** CID 1583619:  Uninitialized variables  (UNINIT)
> sound/virtio/virtio_kctl.c:294 in virtsnd_kctl_tlv_op()
> 288
> 289 break;
> 290   }
> 291
> 292   kfree(tlv);
> 293
> vvv CID 1583619:  Uninitialized variables  (UNINIT)
> vvv Using uninitialized value "rc".
> 294   return rc;
> 295 }
> 296
> 297 /**
> 298  * virtsnd_kctl_get_enum_items() - Query items for the ENUMERATED 
> element type.
> 299  * @snd: VirtIO sound device.
> "
> 
> Signed-off-by: Anton Yakovlev 
> Signed-off-by: Aiswarya Cyriac 
> Reported-by: coverity-bot 
> Addresses-Coverity-ID: 1583619 ("Uninitialized variables")
> Fixes: d6568e3de42d ("ALSA: virtio: add support for audio controls")

I don't know enough about ALSA to say whether the patch is correct.  But
the commit log needs work: please, do not "fix warnings" - analyse the
code and explain whether there is a real issue and if yes what is it
and how it can trigger. Is an invalid op_flag ever passed?
If it's just a coverity false positive it might be ok to
work around that but document this.


> ---
>  sound/virtio/virtio_kctl.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/sound/virtio/virtio_kctl.c b/sound/virtio/virtio_kctl.c
> index 0c6ac74aca1e..d7a160c5db03 100644
> --- a/sound/virtio/virtio_kctl.c
> +++ b/sound/virtio/virtio_kctl.c
> @@ -286,6 +286,11 @@ static int virtsnd_kctl_tlv_op(struct snd_kcontrol 
> *kcontrol, int op_flag,
>   else
>   rc = virtsnd_ctl_msg_send(snd, msg, &sg, NULL, false);
>  
> + break;
> + default:
> + virtsnd_ctl_msg_unref(msg);
> + rc = -EINVAL;
> +

There's already virtsnd_ctl_msg_unref call above.
Also don't we need virtsnd_ctl_msg_unref on other error paths
such as EFAULT?
Unify error handling to fix it all then?

>   break;
>   }
>  
> -- 
> 2.43.0




Re: [PATCH] ALSA: virtio: Fix "Coverity: virtsnd_kctl_tlv_op(): Uninitialized variables" warning.

2024-02-13 Thread Takashi Iwai
On Tue, 13 Feb 2024 09:51:30 +0100,
Aiswarya Cyriac wrote:
> 
> Fix the following warning when building virtio_snd driver.
> 
> "
> *** CID 1583619:  Uninitialized variables  (UNINIT)
> sound/virtio/virtio_kctl.c:294 in virtsnd_kctl_tlv_op()
> 288
> 289 break;
> 290   }
> 291
> 292   kfree(tlv);
> 293
> vvv CID 1583619:  Uninitialized variables  (UNINIT)
> vvv Using uninitialized value "rc".
> 294   return rc;
> 295 }
> 296
> 297 /**
> 298  * virtsnd_kctl_get_enum_items() - Query items for the ENUMERATED 
> element type.
> 299  * @snd: VirtIO sound device.
> "
> 
> Signed-off-by: Anton Yakovlev 
> Signed-off-by: Aiswarya Cyriac 
> Reported-by: coverity-bot 
> Addresses-Coverity-ID: 1583619 ("Uninitialized variables")
> Fixes: d6568e3de42d ("ALSA: virtio: add support for audio controls")

Thanks, applied.


Takashi



[PATCH] ALSA: virtio: Fix "Coverity: virtsnd_kctl_tlv_op(): Uninitialized variables" warning.

2024-02-13 Thread Aiswarya Cyriac
Fix the following warning when building virtio_snd driver.

"
*** CID 1583619:  Uninitialized variables  (UNINIT)
sound/virtio/virtio_kctl.c:294 in virtsnd_kctl_tlv_op()
288
289 break;
290   }
291
292   kfree(tlv);
293
vvv CID 1583619:  Uninitialized variables  (UNINIT)
vvv Using uninitialized value "rc".
294   return rc;
295 }
296
297 /**
298  * virtsnd_kctl_get_enum_items() - Query items for the ENUMERATED 
element type.
299  * @snd: VirtIO sound device.
"

Signed-off-by: Anton Yakovlev 
Signed-off-by: Aiswarya Cyriac 
Reported-by: coverity-bot 
Addresses-Coverity-ID: 1583619 ("Uninitialized variables")
Fixes: d6568e3de42d ("ALSA: virtio: add support for audio controls")
---
 sound/virtio/virtio_kctl.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/sound/virtio/virtio_kctl.c b/sound/virtio/virtio_kctl.c
index 0c6ac74aca1e..d7a160c5db03 100644
--- a/sound/virtio/virtio_kctl.c
+++ b/sound/virtio/virtio_kctl.c
@@ -286,6 +286,11 @@ static int virtsnd_kctl_tlv_op(struct snd_kcontrol 
*kcontrol, int op_flag,
else
rc = virtsnd_ctl_msg_send(snd, msg, &sg, NULL, false);
 
+   break;
+   default:
+   virtsnd_ctl_msg_unref(msg);
+   rc = -EINVAL;
+
break;
}
 
-- 
2.43.0




Re: [PATCH V1] vdpa: suspend and resume require DRIVER_OK

2024-02-13 Thread Eugenio Perez Martin
On Tue, Feb 13, 2024 at 8:49 AM Eugenio Perez Martin
 wrote:
>
> On Mon, Feb 12, 2024 at 9:20 AM Michael S. Tsirkin  wrote:
> >
> > On Fri, Feb 09, 2024 at 02:29:59PM -0800, Steve Sistare wrote:
> > > Calling suspend or resume requires VIRTIO_CONFIG_S_DRIVER_OK, for all
> > > vdpa devices.
> > >
> > > Suggested-by: Eugenio Perez Martin "
> > > Signed-off-by: Steve Sistare 
> >
> > I don't think failing suspend or resume makes sense though -
> > e.g. practically failing suspend will just prevent sleeping I think -
> > why should guest not having driver loaded prevent
> > system suspend?
> >
>
> In the QEMU case the vhost device has not started, so QEMU should
> allow the system suspension.
>
> I haven't tested the QEMU behavior on suspend (not poweroff) with the
> guest driver loaded, but I think QEMU should indeed block the
> suspension, as there is no way to recover the device after that
> without the guest cooperation?
>
> > there's also state such as features set which does need to be
> > preserved.
> >
>
> That's true if the device does not support resuming. Well, in the
> particular case of features, maybe we need to keep it, as userspace
> could call VHOST_GET_FEATURES. But maybe we can clean some things,
> right.
>
> > I think the thing to do is to skip invoking suspend/resume callback, and in
> > fact checking suspend/resume altogether.
> >
>
> I don't follow this. What should be done in this cases by QEMU?
> 1) The device does not support suspend
> 2) The device support suspend but not resume
>
> In my opinion 1) should be forbidden, as we don't support to resume
> the device properly, and 2) can be allowed by fetching all the state.
>

Ok I missed the whole other thread, everything is clear now.

Thanks!

> Thanks!
>
> > > ---
> > >  drivers/vhost/vdpa.c | 6 ++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > index bc4a51e4638b..ce1882acfc3b 100644
> > > --- a/drivers/vhost/vdpa.c
> > > +++ b/drivers/vhost/vdpa.c
> > > @@ -598,6 +598,9 @@ static long vhost_vdpa_suspend(struct vhost_vdpa *v)
> > >   if (!ops->suspend)
> > >   return -EOPNOTSUPP;
> > >
> > > + if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK))
> > > + return -EINVAL;
> > > +
> > >   ret = ops->suspend(vdpa);
> > >   if (!ret)
> > >   v->suspended = true;
> > > @@ -618,6 +621,9 @@ static long vhost_vdpa_resume(struct vhost_vdpa *v)
> > >   if (!ops->resume)
> > >   return -EOPNOTSUPP;
> > >
> > > + if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK))
> > > + return -EINVAL;
> > > +
> > >   ret = ops->resume(vdpa);
> > >   if (!ret)
> > >   v->suspended = false;
> > > --
> > > 2.39.3
> >
> >