Re: [PATCH v2 1/4] tracing/user_events: Prepare find/delete for same name events
Hello, kernel test robot noticed "BUG:KASAN:slab-use-after-free_in_user_events_ioctl" on: commit: fecc001d587ceeeb47043c20353f257e3f01b39f ("[PATCH v2 1/4] tracing/user_events: Prepare find/delete for same name events") url: https://github.com/intel-lab-lkp/linux/commits/Beau-Belgrave/tracing-user_events-Prepare-find-delete-for-same-name-events/20240203-031207 patch link: https://lore.kernel.org/all/20240202184449.1674-2-be...@linux.microsoft.com/ patch subject: [PATCH v2 1/4] tracing/user_events: Prepare find/delete for same name events in testcase: kernel-selftests version: kernel-selftests-x86_64-60acb023-1_20230329 with following parameters: group: user_events compiler: gcc-12 test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 32G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-lkp/202402141240.cc5aba78-oliver.s...@intel.com [ 106.969333][ T2278] BUG: KASAN: slab-use-after-free in user_events_ioctl (kernel/trace/trace_events_user.c:2067 kernel/trace/trace_events_user.c:2401 kernel/trace/trace_events_user.c:2543) [ 106.970079][ T2278] Read of size 8 at addr 88816644ef38 by task abi_test/2278 [ 106.970788][ T2278] [ 106.971058][ T2278] CPU: 2 PID: 2278 Comm: abi_test Not tainted 6.7.0-rc8-1-gfecc001d587c #1 [ 106.971881][ T2278] Hardware name: Gigabyte Technology Co., Ltd. X299 UD4 Pro/X299 UD4 Pro-CF, BIOS F8a 04/27/2021 [ 106.972829][ T2278] Call Trace: [ 106.973185][ T2278] [ 106.973514][ T2278] dump_stack_lvl (lib/dump_stack.c:108) [ 106.973966][ T2278] print_address_description+0x2c/0x3a0 [ 106.974597][ T2278] ? user_events_ioctl (kernel/trace/trace_events_user.c:2067 kernel/trace/trace_events_user.c:2401 kernel/trace/trace_events_user.c:2543) [ 106.975099][ T2278] print_report (mm/kasan/report.c:476) [ 106.975542][ T2278] ? kasan_addr_to_slab (mm/kasan/common.c:35) [ 106.976025][ T2278] ? user_events_ioctl (kernel/trace/trace_events_user.c:2067 kernel/trace/trace_events_user.c:2401 kernel/trace/trace_events_user.c:2543) [ 106.976531][ T2278] kasan_report (mm/kasan/report.c:590) [ 106.976978][ T2278] ? user_events_ioctl (kernel/trace/trace_events_user.c:2067 kernel/trace/trace_events_user.c:2401 kernel/trace/trace_events_user.c:2543) [ 106.977481][ T2278] user_events_ioctl (kernel/trace/trace_events_user.c:2067 kernel/trace/trace_events_user.c:2401 kernel/trace/trace_events_user.c:2543) [ 106.977970][ T2278] __x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:871 fs/ioctl.c:857 fs/ioctl.c:857) [ 106.978441][ T2278] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) [ 106.978889][ T2278] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) [ 106.979462][ T2278] RIP: 0033:0x7f2e121c8b5b [ 106.979907][ T2278] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 All code 0: 00 48 89add%cl,-0x77(%rax) 3: 44 24 18rex.R and $0x18,%al 6: 31 c0 xor%eax,%eax 8: 48 8d 44 24 60 lea0x60(%rsp),%rax d: c7 04 24 10 00 00 00movl $0x10,(%rsp) 14: 48 89 44 24 08 mov%rax,0x8(%rsp) 19: 48 8d 44 24 20 lea0x20(%rsp),%rax 1e: 48 89 44 24 10 mov%rax,0x10(%rsp) 23: b8 10 00 00 00 mov$0x10,%eax 28: 0f 05 syscall 2a:* 89 c2 mov%eax,%edx<-- trapping instruction 2c: 3d 00 f0 ff ff cmp$0xf000,%eax 31: 77 1c ja 0x4f 33: 48 8b 44 24 18 mov0x18(%rsp),%rax 38: 64 fs 39: 48 rex.W 3a: 2b .byte 0x2b 3b: 04 25 add$0x25,%al 3d: 28 00 sub%al,(%rax) ... Code starting with the faulting instruction === 0: 89 c2 mov%eax,%edx 2: 3d 00 f0 ff ff cmp$0xf000,%eax 7: 77 1c ja 0x25 9: 48 8b 44 24 18 mov0x18(%rsp),%rax e: 64 fs f: 48 rex.W 10: 2b .byte 0x2b 11: 04 25 add$0x25,%al 13: 28 00 sub%al,(%rax) ... [ 106.981608][ T2278] RSP: 002b:7ffcb0ba5ed0 EFLAGS: 0246 ORIG_RAX: 0010 [ 106.982385][ T2278] RAX: ffda RBX: 7ffcb0ba6228 RCX: 7f2e121c8b5b [ 106.983128][ T2278] RDX: 564d434bc6fe RSI: 4
Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()
On Tue Feb 13, 2024 at 1:15 AM EET, Haitao Huang wrote: > Hi Jarkko > > On Mon, 12 Feb 2024 13:55:46 -0600, Jarkko Sakkinen > wrote: > > > On Mon Feb 5, 2024 at 11:06 PM EET, Haitao Huang wrote: > >> From: Kristen Carlson Accardi > >> > >> When the EPC usage of a cgroup is near its limit, the cgroup needs to > >> reclaim pages used in the same cgroup to make room for new allocations. > >> This is analogous to the behavior that the global reclaimer is triggered > >> when the global usage is close to total available EPC. > >> > >> Add a Boolean parameter for sgx_epc_cgroup_try_charge() to indicate > >> whether synchronous reclaim is allowed or not. And trigger the > >> synchronous/asynchronous reclamation flow accordingly. > >> > >> Note at this point, all reclaimable EPC pages are still tracked in the > >> global LRU and per-cgroup LRUs are empty. So no per-cgroup reclamation > >> is activated yet. > >> > >> Co-developed-by: Sean Christopherson > >> Signed-off-by: Sean Christopherson > >> Signed-off-by: Kristen Carlson Accardi > >> Co-developed-by: Haitao Huang > >> Signed-off-by: Haitao Huang > >> --- > >> V7: > >> - Split this out from the big patch, #10 in V6. (Dave, Kai) > >> --- > >> arch/x86/kernel/cpu/sgx/epc_cgroup.c | 26 -- > >> arch/x86/kernel/cpu/sgx/epc_cgroup.h | 4 ++-- > >> arch/x86/kernel/cpu/sgx/main.c | 2 +- > >> 3 files changed, 27 insertions(+), 5 deletions(-) > >> > >> diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c > >> b/arch/x86/kernel/cpu/sgx/epc_cgroup.c > >> index d399fda2b55e..abf74fdb12b4 100644 > >> --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c > >> +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c > >> @@ -184,13 +184,35 @@ static void > >> sgx_epc_cgroup_reclaim_work_func(struct work_struct *work) > >> /** > >> * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single EPC > >> page > >> * @epc_cg: The EPC cgroup to be charged for the page. > >> + * @reclaim: Whether or not synchronous reclaim is allowed > >> * Return: > >> * * %0 - If successfully charged. > >> * * -errno - for failures. > >> */ > >> -int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg) > >> +int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool > >> reclaim) > >> { > >> - return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE); > >> + for (;;) { > >> + if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, > >> + PAGE_SIZE)) > >> + break; > >> + > >> + if (sgx_epc_cgroup_lru_empty(epc_cg->cg)) > >> + return -ENOMEM; > >> + +if (signal_pending(current)) > >> + return -ERESTARTSYS; > >> + > >> + if (!reclaim) { > >> + queue_work(sgx_epc_cg_wq, &epc_cg->reclaim_work); > >> + return -EBUSY; > >> + } > >> + > >> + if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false)) > >> + /* All pages were too young to reclaim, try again a > >> little later */ > >> + schedule(); > > > > This will be total pain to backtrack after a while when something > > needs to be changed so there definitely should be inline comments > > addressing each branch condition. > > > > I'd rethink this as: > > > > 1. Create static __sgx_epc_cgroup_try_charge() for addressing single > >iteration with the new "reclaim" parameter. > > 2. Add a new sgx_epc_group_try_charge_reclaim() function. > > > > There's a bit of redundancy with sgx_epc_cgroup_try_charge() and > > sgx_epc_cgroup_try_charge_reclaim() because both have almost the > > same loop calling internal __sgx_epc_cgroup_try_charge() with > > different parameters. That is totally acceptable. > > > > Please also add my suggested-by. > > > > BR, Jarkko > > > > BR, Jarkko > > > For #2: > The only caller of this function, sgx_alloc_epc_page(), has the same > boolean which is passed into this this function. I know. This would be good opportunity to fix that up. Large patch sets should try to make the space for its feature best possible and thus also clean up the code base overally. > If we separate it into sgx_epc_cgroup_try_charge() and > sgx_epc_cgroup_try_charge_reclaim(), then the caller has to have the > if/else branches. So separation here seems not help? Of course it does. It makes the code in that location self-documenting and easier to remember what it does. BR, Jarkko
Re: [PATCH v7 10/36] ftrace/function_graph: Pass fgraph_ops to function graph callbacks
On Wed, 7 Feb 2024 00:09:21 +0900 "Masami Hiramatsu (Google)" wrote: > From: Steven Rostedt (VMware) > > Pass the fgraph_ops structure to the function graph callbacks. This will > allow callbacks to add a descriptor to a fgraph_ops private field that wil > be added in the future and use it for the callbacks. This will be useful > when more than one callback can be registered to the function graph tracer. > > Signed-off-by: Steven Rostedt (VMware) > Signed-off-by: Masami Hiramatsu (Google) > --- > Changes in v2: > - cleanup to set argument name on function prototype. > --- > This patch fails to compile without this change: diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c index e35a941a5af3..47b461b1cf7e 100644 --- a/kernel/trace/fgraph.c +++ b/kernel/trace/fgraph.c @@ -381,7 +381,7 @@ int function_graph_enter(unsigned long ret, unsigned long func, if (gops == &fgraph_stub) continue; - if (gops->entryfunc(&trace)) + if (gops->entryfunc(&trace, gops)) bitmap |= BIT(i); } -- Steve
Re: [PATCH v17 2/6] ring-buffer: Introducing ring-buffer mapping functions
Hi Vincent, kernel test robot noticed the following build errors: [auto build test ERROR on ca185770db914869ff9fe773bac5e0e5e4165b83] url: https://github.com/intel-lab-lkp/linux/commits/Vincent-Donnefort/ring-buffer-Zero-ring-buffer-sub-buffers/20240213-195302 base: ca185770db914869ff9fe773bac5e0e5e4165b83 patch link: https://lore.kernel.org/r/20240213114945.3528801-3-vdonnefort%40google.com patch subject: [PATCH v17 2/6] ring-buffer: Introducing ring-buffer mapping functions config: i386-buildonly-randconfig-001-20240214 (https://download.01.org/0day-ci/archive/20240214/202402140910.tfs9k0yr-...@intel.com/config) compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240214/202402140910.tfs9k0yr-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202402140910.tfs9k0yr-...@intel.com/ All errors (new ones prefixed by >>): >> kernel/trace/ring_buffer.c:6185:2: error: member reference type 'struct >> mutex' is not a pointer; did you mean to use '.'? 6185 | lockdep_assert_held(cpu_buffer->mapping_lock); | ^ include/linux/lockdep.h:267:17: note: expanded from macro 'lockdep_assert_held' 267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD) | ~~~^~ include/linux/lockdep.h:234:52: note: expanded from macro 'lockdep_is_held' 234 | #define lockdep_is_held(lock) lock_is_held(&(lock)->dep_map) | ^ include/linux/lockdep.h:261:32: note: expanded from macro 'lockdep_assert' 261 | do { WARN_ON(debug_locks && !(cond)); } while (0) | ~^~ include/asm-generic/bug.h:123:25: note: expanded from macro 'WARN_ON' 123 | int __ret_warn_on = !!(condition); \ |^ >> kernel/trace/ring_buffer.c:6185:2: error: cannot take the address of an >> rvalue of type 'struct lockdep_map' 6185 | lockdep_assert_held(cpu_buffer->mapping_lock); | ^ include/linux/lockdep.h:267:17: note: expanded from macro 'lockdep_assert_held' 267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD) | ~~~^~ include/linux/lockdep.h:234:45: note: expanded from macro 'lockdep_is_held' 234 | #define lockdep_is_held(lock) lock_is_held(&(lock)->dep_map) | ^ include/linux/lockdep.h:261:32: note: expanded from macro 'lockdep_assert' 261 | do { WARN_ON(debug_locks && !(cond)); } while (0) | ~^~ include/asm-generic/bug.h:123:25: note: expanded from macro 'WARN_ON' 123 | int __ret_warn_on = !!(condition); \ |^ 2 errors generated. vim +6185 kernel/trace/ring_buffer.c 6174 6175 /* 6176 * Fast-path for rb_buffer_(un)map(). Called whenever the meta-page doesn't need 6177 * to be set-up or torn-down. 6178 */ 6179 static int __rb_inc_dec_mapped(struct trace_buffer *buffer, 6180 struct ring_buffer_per_cpu *cpu_buffer, 6181 bool inc) 6182 { 6183 unsigned long flags; 6184 > 6185 lockdep_assert_held(cpu_buffer->mapping_lock); 6186 6187 if (inc && cpu_buffer->mapped == UINT_MAX) 6188 return -EBUSY; 6189 6190 if (WARN_ON(!inc && cpu_buffer->mapped == 0)) 6191 return -EINVAL; 6192 6193 mutex_lock(&buffer->mutex); 6194 raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); 6195 6196 if (inc) 6197 cpu_buffer->mapped++; 6198 else 6199 cpu_buffer->mapped--; 6200 6201 raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); 6202 mutex_unlock(&buffer->mutex); 6203 6204 return 0; 6205 } 6206 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Re: [PATCH v2] tracing: Have saved_cmdlines arrays all in one allocation
On Tue, 2024-02-13 at 11:52 -0500, Steven Rostedt wrote: > From: "Steven Rostedt (Google)" > > The saved_cmdlines have three arrays for mapping PIDs to COMMs: > > - map_pid_to_cmdline[] > - map_cmdline_to_pid[] > - saved_cmdlines > > The map_pid_to_cmdline[] is PID_MAX_DEFAULT in size and holds the index > into the other arrays. The map_cmdline_to_pid[] is a mapping back to the > full pid as it can be larger than PID_MAX_DEFAULT. And the > saved_cmdlines[] just holds the COMMs associated to the pids. > > Currently the map_pid_to_cmdline[] and saved_cmdlines[] are allocated > together (in reality the saved_cmdlines is just in the memory of the > rounding of the allocation of the structure as it is always allocated in > powers of two). The map_cmdline_to_pid[] array is allocated separately. > > Since the rounding to a power of two is rather large (it allows for 8000 > elements in saved_cmdlines), also include the map_cmdline_to_pid[] array. > (This drops it to 6000 by default, which is still plenty for most use > cases). This saves even more memory as the map_cmdline_to_pid[] array > doesn't need to be allocated. > > Link: > https://lore.kernel.org/linux-trace-kernel/20240212174011.06821...@gandalf.local.home/ > > Signed-off-by: Steven Rostedt (Google) Patch looks good to me. Reviewd-by: Tim Chen > --- > Changes since v1: > https://lore.kernel.org/linux-trace-kernel/20240212180941.379c4...@gandalf.local.home/ > > -- Added SAVED_CMDLINE_MAP_ELEMENT_SIZE helper macro. >
Re: [PATCH v2] tracing: Have saved_cmdlines arrays all in one allocation
On Tue, 13 Feb 2024 11:52:32 -0500 Steven Rostedt wrote: > From: "Steven Rostedt (Google)" > > The saved_cmdlines have three arrays for mapping PIDs to COMMs: > > - map_pid_to_cmdline[] > - map_cmdline_to_pid[] > - saved_cmdlines > > The map_pid_to_cmdline[] is PID_MAX_DEFAULT in size and holds the index > into the other arrays. The map_cmdline_to_pid[] is a mapping back to the > full pid as it can be larger than PID_MAX_DEFAULT. And the > saved_cmdlines[] just holds the COMMs associated to the pids. > > Currently the map_pid_to_cmdline[] and saved_cmdlines[] are allocated > together (in reality the saved_cmdlines is just in the memory of the > rounding of the allocation of the structure as it is always allocated in > powers of two). The map_cmdline_to_pid[] array is allocated separately. > > Since the rounding to a power of two is rather large (it allows for 8000 > elements in saved_cmdlines), also include the map_cmdline_to_pid[] array. > (This drops it to 6000 by default, which is still plenty for most use > cases). This saves even more memory as the map_cmdline_to_pid[] array > doesn't need to be allocated. > > Link: > https://lore.kernel.org/linux-trace-kernel/20240212174011.06821...@gandalf.local.home/ > OK, this looks good to me. Acked-by: Masami Hiramatsu (Google) Thank you, > Signed-off-by: Steven Rostedt (Google) > --- > Changes since v1: > https://lore.kernel.org/linux-trace-kernel/20240212180941.379c4...@gandalf.local.home/ > > -- Added SAVED_CMDLINE_MAP_ELEMENT_SIZE helper macro. > > kernel/trace/trace_sched_switch.c | 17 - > 1 file changed, 8 insertions(+), 9 deletions(-) > > diff --git a/kernel/trace/trace_sched_switch.c > b/kernel/trace/trace_sched_switch.c > index e4fbcc3bede5..5f3e9bc87079 100644 > --- a/kernel/trace/trace_sched_switch.c > +++ b/kernel/trace/trace_sched_switch.c > @@ -175,6 +175,10 @@ struct saved_cmdlines_buffer { > }; > static struct saved_cmdlines_buffer *savedcmd; > > +/* Holds the size of a cmdline and pid element */ > +#define SAVED_CMDLINE_MAP_ELEMENT_SIZE(s)\ > + (TASK_COMM_LEN + sizeof((s)->map_cmdline_to_pid[0])) > + > static inline char *get_saved_cmdlines(int idx) > { > return &savedcmd->saved_cmdlines[idx * TASK_COMM_LEN]; > @@ -201,7 +205,7 @@ static struct saved_cmdlines_buffer > *allocate_cmdlines_buffer(unsigned int val) > int order; > > /* Figure out how much is needed to hold the given number of cmdlines */ > - orig_size = sizeof(*s) + val * TASK_COMM_LEN; > + orig_size = sizeof(*s) + val * SAVED_CMDLINE_MAP_ELEMENT_SIZE(s); > order = get_order(orig_size); > size = 1 << (order + PAGE_SHIFT); > page = alloc_pages(GFP_KERNEL, order); > @@ -212,16 +216,11 @@ static struct saved_cmdlines_buffer > *allocate_cmdlines_buffer(unsigned int val) > memset(s, 0, sizeof(*s)); > > /* Round up to actual allocation */ > - val = (size - sizeof(*s)) / TASK_COMM_LEN; > + val = (size - sizeof(*s)) / SAVED_CMDLINE_MAP_ELEMENT_SIZE(s); > s->cmdline_num = val; > > - s->map_cmdline_to_pid = kmalloc_array(val, > - sizeof(*s->map_cmdline_to_pid), > - GFP_KERNEL); > - if (!s->map_cmdline_to_pid) { > - free_saved_cmdlines_buffer(s); > - return NULL; > - } > + /* Place map_cmdline_to_pid array right after saved_cmdlines */ > + s->map_cmdline_to_pid = (unsigned *)&s->saved_cmdlines[val * > TASK_COMM_LEN]; > > s->cmdline_idx = 0; > memset(&s->map_pid_to_cmdline, NO_CMDLINE_MAP, > -- > 2.43.0 > > -- Masami Hiramatsu (Google)
Re: [PATCH v17 2/6] ring-buffer: Introducing ring-buffer mapping functions
On Tue, 13 Feb 2024 15:53:09 -0500 Steven Rostedt wrote: > On Tue, 13 Feb 2024 11:49:41 + > Vincent Donnefort wrote: > > Did you test with lockdep? > > > +static int __rb_inc_dec_mapped(struct trace_buffer *buffer, > > + struct ring_buffer_per_cpu *cpu_buffer, > > + bool inc) > > +{ > > + unsigned long flags; > > + > > + lockdep_assert_held(cpu_buffer->mapping_lock); > > /work/git/linux-trace.git/kernel/trace/ring_buffer.c: In function > ‘__rb_inc_dec_mapped’: > /work/git/linux-trace.git/include/linux/lockdep.h:234:61: error: invalid type > argument of ‘->’ (have ‘struct mutex’) > 234 | #define lockdep_is_held(lock) lock_is_held(&(lock)->dep_map) > | ^~ > /work/git/linux-trace.git/include/asm-generic/bug.h:123:32: note: in > definition of macro ‘WARN_ON’ > 123 | int __ret_warn_on = !!(condition); > \ > |^ > /work/git/linux-trace.git/include/linux/lockdep.h:267:9: note: in expansion > of macro ‘lockdep_assert’ > 267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD) > | ^~ > /work/git/linux-trace.git/include/linux/lockdep.h:267:24: note: in expansion > of macro ‘lockdep_is_held’ > 267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD) > |^~~ > /work/git/linux-trace.git/kernel/trace/ring_buffer.c:6167:9: note: in > expansion of macro ‘lockdep_assert_held’ > 6167 | lockdep_assert_held(cpu_buffer->mapping_lock); > | ^~~ > > I believe that is supposed to be: > > lockdep_assert_held(&cpu_buffer->mapping_lock); If this is the only issue with this series, I may just fix up the patch myself.
Re: [PATCH v10 2/4] dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings
On Tue, Feb 13, 2024 at 02:37:49PM -0600, Tanmay Shah wrote: > Hello, > > Thanks for reviews please find my comments below. > > On 2/13/24 1:20 PM, Rob Herring wrote: > > On Tue, 13 Feb 2024 09:54:48 -0800, Tanmay Shah wrote: > > > From: Radhey Shyam Pandey > > > > > > Introduce bindings for TCM memory address space on AMD-xilinx Zynq > > > UltraScale+ platform. It will help in defining TCM in device-tree > > > and make it's access platform agnostic and data-driven. > > > > > > Tightly-coupled memories(TCMs) are low-latency memory that provides > > > predictable instruction execution and predictable data load/store > > > timing. Each Cortex-R5F processor contains two 64-bit wide 64 KB memory > > > banks on the ATCM and BTCM ports, for a total of 128 KB of memory. > > > > > > The TCM resources(reg, reg-names and power-domain) are documented for > > > each TCM in the R5 node. The reg and reg-names are made as required > > > properties as we don't want to hardcode TCM addresses for future > > > platforms and for zu+ legacy implementation will ensure that the > > > old dts w/o reg/reg-names works and stable ABI is maintained. > > > > > > It also extends the examples for TCM split and lockstep modes. > > > > > > Signed-off-by: Radhey Shyam Pandey > > > Signed-off-by: Tanmay Shah > > > --- > > > > > > Changes in v10: > > > - modify number of "reg", "reg-names" and "power-domains" entries > > > based on cluster mode > > > - Add extra optional atcm and btcm in "reg" property for lockstep mode > > > - Add "reg-names" for extra optional atcm and btcm for lockstep mode > > > - Drop previous Ack as bindings has new change > > > > > > Changes in v9: > > > - None > > > Changes in v8: > > > - None > > > Changes in v7: > > > - None > > > Changes in v6: > > > - None > > > Changes in v5: > > > - None > > > > > > Changes in v4: > > > - Use address-cells and size-cells value 2 > > > - Modify ranges property as per new value of address-cells > > > and size-cells > > > - Modify child node "reg" property accordingly > > > - Remove previous ack for further review > > > > > > v4 link: > > > https://lore.kernel.org/all/20230829181900.2561194-2-tanmay.s...@amd.com/ > > > > > > .../remoteproc/xlnx,zynqmp-r5fss.yaml | 192 -- > > > 1 file changed, 170 insertions(+), 22 deletions(-) > > > > > > > My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check' > > on your patch (DT_CHECKER_FLAGS is new in v5.13): > > > > yamllint warnings/errors: > > ./Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml:118:13: > > [warning] wrong indentation: expected 10 but found 12 (indentation) > Ack. I will fix this. > > However, can I still get reviews on patch itself so if something else needs > to be fixed I can fix in next revision as well. > > Also, I tried to run yamllint with following command: > > make DT_CHECKER_FLAGS=-m dt_binding_check > DT_SCHEMA_FILES=Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml > O=../build/zynqmp/linux-next/ > > > However, I see following logs without any error on bindings: > > LINT Documentation/devicetree/bindings > invalid config: unknown option "required" for rule "quoted-strings" > *xargs: /usr/bin/yamllint: exited with status 255; aborting* > CHKDT Documentation/devicetree/bindings/processed-schema.json > SCHEMA Documentation/devicetree/bindings/processed-schema.json > DTEX > Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.example.dts > DTC_CHK > Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.example.dtb > > I am not sure if my system is missing something but, yamllint tool is failing. "unknown option" means old version of yamllint. Rob
Re: [PATCH v1] module.h: define __symbol_get_gpl() as a regular __symbol_get()
On Thu, Feb 01, 2024 at 10:13:54AM -0800, Luis Chamberlain wrote: > > While you're at it, if you want to try it, you could see if you can > improve the situation more by looking at symbol_get() users that remain > and seeing if you can instead fix it with proper Kconfig dependency and > at build time. Then we can just remove it as well. > > Luis Sorry for the late reply. Luis, can you give more details of your idea? I re-read it once, then came back and still don't understand. I see that there are ~10 users for symbol_get() currently. Do you want to stringify symbol names at build time to completely remove symbol_get() from module.h? Correct me if I'm wrong since using of a fuction which is not declared anywhere sounds confusing. -- Andrew Kanner
Re: CPU data cache across reboot/kexec for pmem/dax devices
On 2024-02-09 15:15, Dan Williams wrote: Mathieu Desnoyers wrote: Hi Dan, In the context of extracting user-space trace data when the kernel crashes, the LTTng user-space tracer recommends using nvdimm/pmem to reserve an area of physical (volatile) RAM at boot (memmap=nn[KMG]!ss[KMG]), and use the resulting device to create/mount a dax-enabled fs (e.g. ext4). We then use this filesystem to mmap() the shared memory files for the tracer. I want to make sure that the very last events from the userspace tracer written to the memory mapped buffers (mmap()) by userspace are present after a warm-reboot (or kexec/kdump). Note that the LTTng user-space tracer (LTTng-UST) does *not* issue any clflush (or equivalent pmem_persist() from libpmem) for performance reasons: ring buffer data is usually overwritten many times before the system actually crashes, and the only thing we really need to make sure is that the cache lines are not invalidated without write back. So I understand that the main use-case for pmem is nvdimm, and that in order to guarantee persistence of the data on power off an explicit pmem_persist() is needed after each "transaction", but for the needs of tracing, is there some kind of architectural guarantee that the data present in the cpu data cache is not invalidated prior to write back in each of those scenarios ? - reboot with bios explicitly not clearing memory, This one gives me pause, because a trip through the BIOS typically means lots of resets and other low level magic, so this would likely require pushing dirty data out of CPU caches prior to entering the BIOS code paths. So this either needs explicit cache flushing or mapping the memory with write-through semantics. That latter one is not supported in the stack today. I would favor the explicit cache flushing approach over write-through semantics for performance reasons: typically the ring buffers are overwritten, so always storing the data to memory would be wasteful. But I would rather do the cache flushing only on crash (die oops/panic notifiers) rather than require the tracer to flush cache lines immediately after each event is produced, again for performance reasons. I have done a crude attempt at registering die oops/panic notifiers from the pmem driver, and flush all pmem areas to memory when die oops/panic notifiers are called (see the untested patch below). I wonder if this is something that would be generally useful ? - kexec/kdump. This should maintain the state of CPU caches. As far as the CPU is concerned it is just long jumping into a new kernel in memory without resetting any CPU cache state. Good to know that this scenario is expected to "Just Work". Thanks, Mathieu diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index e9898457a7bd..b92f18607d39 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -26,12 +26,18 @@ #include #include #include +#include #include #include "pmem.h" #include "btt.h" #include "pfn.h" #include "nd.h" +static int pmem_die_handler(struct notifier_block *self, + unsigned long ev, void *unused); +static int pmem_panic_handler(struct notifier_block *self, + unsigned long ev, void *unused); + static struct device *to_dev(struct pmem_device *pmem) { /* @@ -423,6 +429,9 @@ static void pmem_release_disk(void *__pmem) { struct pmem_device *pmem = __pmem; + atomic_notifier_chain_unregister(&panic_notifier_list, + &pmem->panic_notifier); + unregister_die_notifier(&pmem->die_notifier); dax_remove_host(pmem->disk); kill_dax(pmem->dax_dev); put_dax(pmem->dax_dev); @@ -573,9 +582,20 @@ static int pmem_attach_disk(struct device *dev, goto out_cleanup_dax; dax_write_cache(dax_dev, nvdimm_has_cache(nd_region)); } - rc = device_add_disk(dev, disk, pmem_attribute_groups); + pmem->die_notifier.notifier_call = pmem_die_handler; + pmem->die_notifier.priority = -INT_MAX; + rc = register_die_notifier(&pmem->die_notifier); if (rc) goto out_remove_host; + pmem->panic_notifier.notifier_call = pmem_panic_handler; + pmem->panic_notifier.priority = -INT_MAX; + rc = atomic_notifier_chain_register(&panic_notifier_list, + &pmem->panic_notifier); + if (rc) + goto out_unregister_die; + rc = device_add_disk(dev, disk, pmem_attribute_groups); + if (rc) + goto out_unregister_panic; if (devm_add_action_or_reset(dev, pmem_release_disk, pmem)) return -ENOMEM; @@ -587,6 +607,11 @@ static int pmem_attach_disk(struct device *dev, dev_warn(dev, "'badblocks' notification disabled\n"); return 0; +out_unregister_panic: + atomic_notifier_chain_unregister(&panic_notifier_list, + &pmem->panic_notifier); +out_unregister_d
Re: [PATCH v17 2/6] ring-buffer: Introducing ring-buffer mapping functions
On Tue, 13 Feb 2024 11:49:41 + Vincent Donnefort wrote: Did you test with lockdep? > +static int __rb_inc_dec_mapped(struct trace_buffer *buffer, > +struct ring_buffer_per_cpu *cpu_buffer, > +bool inc) > +{ > + unsigned long flags; > + > + lockdep_assert_held(cpu_buffer->mapping_lock); /work/git/linux-trace.git/kernel/trace/ring_buffer.c: In function ‘__rb_inc_dec_mapped’: /work/git/linux-trace.git/include/linux/lockdep.h:234:61: error: invalid type argument of ‘->’ (have ‘struct mutex’) 234 | #define lockdep_is_held(lock) lock_is_held(&(lock)->dep_map) | ^~ /work/git/linux-trace.git/include/asm-generic/bug.h:123:32: note: in definition of macro ‘WARN_ON’ 123 | int __ret_warn_on = !!(condition); \ |^ /work/git/linux-trace.git/include/linux/lockdep.h:267:9: note: in expansion of macro ‘lockdep_assert’ 267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD) | ^~ /work/git/linux-trace.git/include/linux/lockdep.h:267:24: note: in expansion of macro ‘lockdep_is_held’ 267 | lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD) |^~~ /work/git/linux-trace.git/kernel/trace/ring_buffer.c:6167:9: note: in expansion of macro ‘lockdep_assert_held’ 6167 | lockdep_assert_held(cpu_buffer->mapping_lock); | ^~~ I believe that is supposed to be: lockdep_assert_held(&cpu_buffer->mapping_lock); -- Steve > + > + if (inc && cpu_buffer->mapped == UINT_MAX) > + return -EBUSY; > + > + if (WARN_ON(!inc && cpu_buffer->mapped == 0)) > + return -EINVAL; > + > + mutex_lock(&buffer->mutex); > + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); > + > + if (inc) > + cpu_buffer->mapped++; > + else > + cpu_buffer->mapped--; > + > + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); > + mutex_unlock(&buffer->mutex); > + > + return 0; > +}
Re: [PATCH v10 2/4] dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings
Hello, Thanks for reviews please find my comments below. On 2/13/24 1:20 PM, Rob Herring wrote: > On Tue, 13 Feb 2024 09:54:48 -0800, Tanmay Shah wrote: > > From: Radhey Shyam Pandey > > > > Introduce bindings for TCM memory address space on AMD-xilinx Zynq > > UltraScale+ platform. It will help in defining TCM in device-tree > > and make it's access platform agnostic and data-driven. > > > > Tightly-coupled memories(TCMs) are low-latency memory that provides > > predictable instruction execution and predictable data load/store > > timing. Each Cortex-R5F processor contains two 64-bit wide 64 KB memory > > banks on the ATCM and BTCM ports, for a total of 128 KB of memory. > > > > The TCM resources(reg, reg-names and power-domain) are documented for > > each TCM in the R5 node. The reg and reg-names are made as required > > properties as we don't want to hardcode TCM addresses for future > > platforms and for zu+ legacy implementation will ensure that the > > old dts w/o reg/reg-names works and stable ABI is maintained. > > > > It also extends the examples for TCM split and lockstep modes. > > > > Signed-off-by: Radhey Shyam Pandey > > Signed-off-by: Tanmay Shah > > --- > > > > Changes in v10: > > - modify number of "reg", "reg-names" and "power-domains" entries > > based on cluster mode > > - Add extra optional atcm and btcm in "reg" property for lockstep mode > > - Add "reg-names" for extra optional atcm and btcm for lockstep mode > > - Drop previous Ack as bindings has new change > > > > Changes in v9: > > - None > > Changes in v8: > > - None > > Changes in v7: > > - None > > Changes in v6: > > - None > > Changes in v5: > > - None > > > > Changes in v4: > > - Use address-cells and size-cells value 2 > > - Modify ranges property as per new value of address-cells > > and size-cells > > - Modify child node "reg" property accordingly > > - Remove previous ack for further review > > > > v4 link: > > https://lore.kernel.org/all/20230829181900.2561194-2-tanmay.s...@amd.com/ > > > > .../remoteproc/xlnx,zynqmp-r5fss.yaml | 192 -- > > 1 file changed, 170 insertions(+), 22 deletions(-) > > > > My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check' > on your patch (DT_CHECKER_FLAGS is new in v5.13): > > yamllint warnings/errors: > ./Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml:118:13: > [warning] wrong indentation: expected 10 but found 12 (indentation) Ack. I will fix this. However, can I still get reviews on patch itself so if something else needs to be fixed I can fix in next revision as well. Also, I tried to run yamllint with following command: make DT_CHECKER_FLAGS=-m dt_binding_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml O=../build/zynqmp/linux-next/ However, I see following logs without any error on bindings: LINT Documentation/devicetree/bindings invalid config: unknown option "required" for rule "quoted-strings" *xargs: /usr/bin/yamllint: exited with status 255; aborting* CHKDT Documentation/devicetree/bindings/processed-schema.json SCHEMA Documentation/devicetree/bindings/processed-schema.json DTEX Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.example.dts DTC_CHK Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.example.dtb I am not sure if my system is missing something but, yamllint tool is failing. I appreciate if someone can point to quick fix if this is known issue and is already resolved. Thanks, Tanmay > dtschema/dtc warnings/errors: > > doc reference errors (make refcheckdocs): > > See > https://patchwork.ozlabs.org/project/devicetree-bindings/patch/20240213175450.3097308-3-tanmay.s...@amd.com > > The base for the series is generally the latest rc1. A different dependency > should be noted in *this* patch. > > If you already ran 'make dt_binding_check' and didn't see the above > error(s), then make sure 'yamllint' is installed and dt-schema is up to > date: > > pip3 install dtschema --upgrade > > Please check and re-submit after running the above command yourself. Note > that DT_SCHEMA_FILES can be set to your schema file to speed up checking > your schema. However, it must be unset to test all examples with your schema. >
Re: [PATCH v10 2/4] dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings
On Tue, 13 Feb 2024 09:54:48 -0800, Tanmay Shah wrote: > From: Radhey Shyam Pandey > > Introduce bindings for TCM memory address space on AMD-xilinx Zynq > UltraScale+ platform. It will help in defining TCM in device-tree > and make it's access platform agnostic and data-driven. > > Tightly-coupled memories(TCMs) are low-latency memory that provides > predictable instruction execution and predictable data load/store > timing. Each Cortex-R5F processor contains two 64-bit wide 64 KB memory > banks on the ATCM and BTCM ports, for a total of 128 KB of memory. > > The TCM resources(reg, reg-names and power-domain) are documented for > each TCM in the R5 node. The reg and reg-names are made as required > properties as we don't want to hardcode TCM addresses for future > platforms and for zu+ legacy implementation will ensure that the > old dts w/o reg/reg-names works and stable ABI is maintained. > > It also extends the examples for TCM split and lockstep modes. > > Signed-off-by: Radhey Shyam Pandey > Signed-off-by: Tanmay Shah > --- > > Changes in v10: > - modify number of "reg", "reg-names" and "power-domains" entries > based on cluster mode > - Add extra optional atcm and btcm in "reg" property for lockstep mode > - Add "reg-names" for extra optional atcm and btcm for lockstep mode > - Drop previous Ack as bindings has new change > > Changes in v9: > - None > Changes in v8: > - None > Changes in v7: > - None > Changes in v6: > - None > Changes in v5: > - None > > Changes in v4: > - Use address-cells and size-cells value 2 > - Modify ranges property as per new value of address-cells > and size-cells > - Modify child node "reg" property accordingly > - Remove previous ack for further review > > v4 link: > https://lore.kernel.org/all/20230829181900.2561194-2-tanmay.s...@amd.com/ > > .../remoteproc/xlnx,zynqmp-r5fss.yaml | 192 -- > 1 file changed, 170 insertions(+), 22 deletions(-) > My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check' on your patch (DT_CHECKER_FLAGS is new in v5.13): yamllint warnings/errors: ./Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml:118:13: [warning] wrong indentation: expected 10 but found 12 (indentation) dtschema/dtc warnings/errors: doc reference errors (make refcheckdocs): See https://patchwork.ozlabs.org/project/devicetree-bindings/patch/20240213175450.3097308-3-tanmay.s...@amd.com The base for the series is generally the latest rc1. A different dependency should be noted in *this* patch. If you already ran 'make dt_binding_check' and didn't see the above error(s), then make sure 'yamllint' is installed and dt-schema is up to date: pip3 install dtschema --upgrade Please check and re-submit after running the above command yourself. Note that DT_SCHEMA_FILES can be set to your schema file to speed up checking your schema. However, it must be unset to test all examples with your schema.
Re: [Linux-stm32] [PATCH v2 4/4] remoteproc: stm32: Add support of an OP-TEE TA to load the firmware
On Tue, 13 Feb 2024 at 08:48, Arnaud POULIQUEN wrote: > > Hello Mathieu, > > On 2/5/24 10:13, Arnaud POULIQUEN wrote: > > > > > > On 2/2/24 20:53, Mathieu Poirier wrote: > >> On Thu, Feb 01, 2024 at 07:33:35PM +0100, Arnaud POULIQUEN wrote: > >>> > >>> > >>> On 2/1/24 17:02, Mathieu Poirier wrote: > On Thu, Feb 01, 2024 at 04:06:37PM +0100, Arnaud POULIQUEN wrote: > > hello Mathieu, > > > > On 1/31/24 19:52, Mathieu Poirier wrote: > >> On Tue, Jan 30, 2024 at 10:13:48AM +0100, Arnaud POULIQUEN wrote: > >>> > >>> > >>> On 1/26/24 18:11, Mathieu Poirier wrote: > On Thu, Jan 18, 2024 at 11:04:33AM +0100, Arnaud Pouliquen wrote: > > The new TEE remoteproc device is used to manage remote firmware in a > > secure, trusted context. The 'st,stm32mp1-m4-tee' compatibility is > > introduced to delegate the loading of the firmware to the trusted > > execution context. In such cases, the firmware should be signed and > > adhere to the image format defined by the TEE. > > > > Signed-off-by: Arnaud Pouliquen > > --- > > V1 to V2 update: > > - remove the select "TEE_REMOTEPROC" in STM32_RPROC config as > > detected by > > the kernel test robot: > > WARNING: unmet direct dependencies detected for TEE_REMOTEPROC > > Depends on [n]: REMOTEPROC [=y] && OPTEE [=n] > > Selected by [y]: > > - STM32_RPROC [=y] && (ARCH_STM32 || COMPILE_TEST [=y]) && > > REMOTEPROC [=y] > > - Fix initialized trproc variable in stm32_rproc_probe > > --- > > drivers/remoteproc/stm32_rproc.c | 149 > > +-- > > 1 file changed, 144 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/remoteproc/stm32_rproc.c > > b/drivers/remoteproc/stm32_rproc.c > > index fcc0001e2657..cf6a21bac945 100644 > > --- a/drivers/remoteproc/stm32_rproc.c > > +++ b/drivers/remoteproc/stm32_rproc.c > > @@ -20,6 +20,7 @@ > > #include > > #include > > #include > > +#include > > #include > > > > #include "remoteproc_internal.h" > > @@ -49,6 +50,9 @@ > > #define M4_STATE_STANDBY 4 > > #define M4_STATE_CRASH 5 > > > > +/* Remote processor unique identifier aligned with the Trusted > > Execution Environment definitions */ > > +#define STM32_MP1_M4_PROC_ID0 > > + > > struct stm32_syscon { > > struct regmap *map; > > u32 reg; > > @@ -90,6 +94,8 @@ struct stm32_rproc { > > struct stm32_mbox mb[MBOX_NB_MBX]; > > struct workqueue_struct *workqueue; > > bool hold_boot_smc; > > + bool fw_loaded; > > + struct tee_rproc *trproc; > > void __iomem *rsc_va; > > }; > > > > @@ -257,6 +263,91 @@ static int stm32_rproc_release(struct rproc > > *rproc) > > return err; > > } > > > > +static int stm32_rproc_tee_elf_sanity_check(struct rproc *rproc, > > + const struct firmware *fw) > > +{ > > + struct stm32_rproc *ddata = rproc->priv; > > + unsigned int ret = 0; > > + > > + if (rproc->state == RPROC_DETACHED) > > + return 0; > > + > > + ret = tee_rproc_load_fw(ddata->trproc, fw); > > + if (!ret) > > + ddata->fw_loaded = true; > > + > > + return ret; > > +} > > + > > +static int stm32_rproc_tee_elf_load(struct rproc *rproc, > > + const struct firmware *fw) > > +{ > > + struct stm32_rproc *ddata = rproc->priv; > > + unsigned int ret; > > + > > + /* > > + * This function can be called by remote proc for recovery > > + * without the sanity check. In this case we need to load the > > firmware > > + * else nothing done here as the firmware has been preloaded > > for the > > + * sanity check to be able to parse it for the resource table. > > + */ > > This comment is very confusing - please consider refactoring. > > > + if (ddata->fw_loaded) > > + return 0; > > + > > I'm not sure about keeping a flag to indicate the status of the > loaded firmware. > It is not done for the non-secure method, I don't see why it would > be needed for > the secure one. > > >>> > >>> The difference is on the sanity check. > >>>
[PATCH] MAINTAINERS: add Eugenio Pérez as reviewer
Add myself as a reviewer of some VirtIO areas I'm interested. Until this point I've been scanning manually the list looking for series that touches this area. Adding myself to make this task easier. Signed-off-by: Eugenio Pérez --- MAINTAINERS | 4 1 file changed, 4 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 61117c3afa80..d0789ecc4f70 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -23219,6 +23219,7 @@ M: "Michael S. Tsirkin" M: Jason Wang R: Paolo Bonzini R: Stefan Hajnoczi +R: Eugenio Pérez L: virtualizat...@lists.linux.dev S: Maintained F: drivers/block/virtio_blk.c @@ -23237,6 +23238,7 @@ VIRTIO CORE AND NET DRIVERS M: "Michael S. Tsirkin" M: Jason Wang R: Xuan Zhuo +R: Eugenio Pérez L: virtualizat...@lists.linux.dev S: Maintained F: Documentation/ABI/testing/sysfs-bus-vdpa @@ -23277,6 +23279,7 @@ VIRTIO FILE SYSTEM M: Vivek Goyal M: Stefan Hajnoczi M: Miklos Szeredi +R: Eugenio Pérez L: virtualizat...@lists.linux.dev L: linux-fsde...@vger.kernel.org S: Supported @@ -23310,6 +23313,7 @@ F: include/uapi/linux/virtio_gpu.h VIRTIO HOST (VHOST) M: "Michael S. Tsirkin" M: Jason Wang +R: Eugenio Pérez L: k...@vger.kernel.org L: virtualizat...@lists.linux.dev L: net...@vger.kernel.org -- 2.43.0
Re: [PATCH net-next v3] net: dqs: add NIC stall detector based on BQL
On Tue, 13 Feb 2024 14:57:49 +0100 Eric Dumazet wrote: > Please note that adding other sysfs entries is expensive for workloads > creating/deleting netdev and netns often. > > I _think_ we should find a way for not creating > /sys/class/net//queues/tx-{Q}/byte_queue_limits directory > and files > for non BQL enabled devices (like loopback !) We should try, see if anyone screams. We could use IFF_NO_QUEUE, and NETIF_F_LLTX as a proxy for "device doesn't have a real queue so BQL would be pointless"? Obviously better to annotate the drivers which do have BQL support, but there's >50 of them on a quick count..
[PATCH v10 2/4] dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings
From: Radhey Shyam Pandey Introduce bindings for TCM memory address space on AMD-xilinx Zynq UltraScale+ platform. It will help in defining TCM in device-tree and make it's access platform agnostic and data-driven. Tightly-coupled memories(TCMs) are low-latency memory that provides predictable instruction execution and predictable data load/store timing. Each Cortex-R5F processor contains two 64-bit wide 64 KB memory banks on the ATCM and BTCM ports, for a total of 128 KB of memory. The TCM resources(reg, reg-names and power-domain) are documented for each TCM in the R5 node. The reg and reg-names are made as required properties as we don't want to hardcode TCM addresses for future platforms and for zu+ legacy implementation will ensure that the old dts w/o reg/reg-names works and stable ABI is maintained. It also extends the examples for TCM split and lockstep modes. Signed-off-by: Radhey Shyam Pandey Signed-off-by: Tanmay Shah --- Changes in v10: - modify number of "reg", "reg-names" and "power-domains" entries based on cluster mode - Add extra optional atcm and btcm in "reg" property for lockstep mode - Add "reg-names" for extra optional atcm and btcm for lockstep mode - Drop previous Ack as bindings has new change Changes in v9: - None Changes in v8: - None Changes in v7: - None Changes in v6: - None Changes in v5: - None Changes in v4: - Use address-cells and size-cells value 2 - Modify ranges property as per new value of address-cells and size-cells - Modify child node "reg" property accordingly - Remove previous ack for further review v4 link: https://lore.kernel.org/all/20230829181900.2561194-2-tanmay.s...@amd.com/ .../remoteproc/xlnx,zynqmp-r5fss.yaml | 192 -- 1 file changed, 170 insertions(+), 22 deletions(-) diff --git a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml index 78aac69f1060..2de74307821e 100644 --- a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml +++ b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml @@ -20,9 +20,21 @@ properties: compatible: const: xlnx,zynqmp-r5fss + "#address-cells": +const: 2 + + "#size-cells": +const: 2 + + ranges: +description: | + Standard ranges definition providing address translations for + local R5F TCM address spaces to bus addresses. + xlnx,cluster-mode: $ref: /schemas/types.yaml#/definitions/uint32 enum: [0, 1, 2] +default: 1 description: | The RPU MPCore can operate in split mode (Dual-processor performance), Safety lock-step mode(Both RPU cores execute the same code in lock-step, @@ -37,7 +49,7 @@ properties: 2: single cpu mode patternProperties: - "^r5f-[a-f0-9]+$": + "^r5f@[0-9a-f]+$": type: object description: | The RPU is located in the Low Power Domain of the Processor Subsystem. @@ -54,9 +66,6 @@ patternProperties: compatible: const: xlnx,zynqmp-r5f - power-domains: -maxItems: 1 - mboxes: minItems: 1 items: @@ -101,35 +110,174 @@ patternProperties: required: - compatible - - power-domains -unevaluatedProperties: false +allOf: + - if: + properties: +xlnx,cluster-mode: +enum: + - 1 +then: + patternProperties: +"^r5f@[0-9a-f]+$": + type: object + + properties: +reg: + minItems: 1 + items: +- description: ATCM internal memory +- description: BTCM internal memory +- description: extra ATCM memory in lockstep mode +- description: extra BTCM memory in lockstep mode + +reg-names: + minItems: 1 + items: +- const: atcm0 +- const: btcm0 +- const: atcm1 +- const: btcm1 + +power-domains: + minItems: 2 + maxItems: 5 + + required: +- reg +- reg-names +- power-domains + +else: + patternProperties: +"^r5f@[0-9a-f]+$": + type: object + + properties: +reg: + minItems: 1 + items: +- description: ATCM internal memory +- description: BTCM internal memory + +reg-names: + minItems: 1 + items: +- const: atcm0 +- const: btcm0 + +power-domains: + minItems: 2 + maxItems: 3 + + required: +- reg +- reg-names +- power-domains required: - compatible + - "#address-cells" + - "#size-cells" + - ranges additionalProperties: false examples: - | -remoteproc { -co
[PATCH v10 3/4] dts: zynqmp: add properties for TCM in remoteproc
Add properties as per new bindings in zynqmp remoteproc node to represent TCM address and size. This patch also adds alternative remoteproc node to represent remoteproc cluster in split mode. By default lockstep mode is enabled and users should disable it before using split mode dts. Both device-tree nodes can't be used simultaneously one of them must be disabled. For zcu102-1.0 and zcu102-1.1 board remoteproc split mode dts node is enabled and lockstep mode dts is disabled. Signed-off-by: Tanmay Shah --- Changes in v10: - Add individual tcm regions via "reg" and "reg-names" for lockstep mode - Add each tcm's power-domains in lockstep mode - Drop previous Ack as new change in this patchset Changes in v9: - fix rproc lockstep dts .../boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts | 8 +++ arch/arm64/boot/dts/xilinx/zynqmp.dtsi| 65 +-- 2 files changed, 68 insertions(+), 5 deletions(-) diff --git a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts index c8f71a1aec89..495ca94b45db 100644 --- a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts +++ b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts @@ -14,6 +14,14 @@ / { compatible = "xlnx,zynqmp-zcu102-rev1.0", "xlnx,zynqmp-zcu102", "xlnx,zynqmp"; }; +&rproc_split { + status = "okay"; +}; + +&rproc_lockstep { + status = "disabled"; +}; + &eeprom { #address-cells = <1>; #size-cells = <1>; diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi index eaba466804bc..c8a7fd0f3a1e 100644 --- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi +++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi @@ -248,19 +248,74 @@ fpga_full: fpga-full { ranges; }; - remoteproc { + rproc_lockstep: remoteproc@ffe0 { compatible = "xlnx,zynqmp-r5fss"; xlnx,cluster-mode = <1>; - r5f-0 { + #address-cells = <2>; + #size-cells = <2>; + + ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>, +<0x0 0x2 0x0 0xffe2 0x0 0x1>, +<0x0 0x1 0x0 0xffe1 0x0 0x1>, +<0x0 0x3 0x0 0xffe3 0x0 0x1>; + + r5f@0 { + compatible = "xlnx,zynqmp-r5f"; + reg = <0x0 0x0 0x0 0x1>, + <0x0 0x2 0x0 0x1>, + <0x0 0x1 0x0 0x1>, + <0x0 0x3 0x0 0x1>; + reg-names = "atcm0", "btcm0", "atcm1", "btcm1"; + power-domains = <&zynqmp_firmware PD_RPU_0>, + <&zynqmp_firmware PD_R5_0_ATCM>, + <&zynqmp_firmware PD_R5_0_BTCM>, + <&zynqmp_firmware PD_R5_1_ATCM>, + <&zynqmp_firmware PD_R5_1_BTCM>; + memory-region = <&rproc_0_fw_image>; + }; + + r5f@1 { + compatible = "xlnx,zynqmp-r5f"; + reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>; + reg-names = "atcm0", "btcm0"; + power-domains = <&zynqmp_firmware PD_RPU_1>, + <&zynqmp_firmware PD_R5_1_ATCM>, + <&zynqmp_firmware PD_R5_1_BTCM>; + memory-region = <&rproc_1_fw_image>; + }; + }; + + rproc_split: remoteproc-split@ffe0 { + status = "disabled"; + compatible = "xlnx,zynqmp-r5fss"; + xlnx,cluster-mode = <0>; + + #address-cells = <2>; + #size-cells = <2>; + + ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>, +<0x0 0x2 0x0 0xffe2 0x0 0x1>, +<0x1 0x0 0x0 0xffe9 0x0 0x1>, +<0x1 0x2 0x0 0xffeb 0x0 0x1>; + + r5f@0 { compatible = "xlnx,zynqmp-r5f"; - power-domains = <&zynqmp_firmware PD_RPU_0>; + reg = <0x0 0x0 0x0 0x1>, <0x0 0x2 0x0 0x1>; + reg-names = "atcm0", "btcm0"; + power-domains = <&zynqmp_firmware PD_RPU_0>, + <&zynqmp_firmware PD_R5_0_ATCM>, + <&zynqmp_firmware PD_R5_0_BTCM>; memory-region = <&rproc_0_fw_image>; }; - r5f-1 { + r5f@1 { compatible = "xlnx,zynqmp-r5f"; - power-domains = <&zynqmp_firmware PD_RPU_1>; + reg = <0x1 0x0 0x0 0x1>, <0x1 0x20
[PATCH v10 4/4] remoteproc: zynqmp: parse TCM from device tree
ZynqMP TCM information was fixed in driver. Now ZynqMP TCM information is available in device-tree. Parse TCM information in driver as per new bindings. Signed-off-by: Tanmay Shah --- Changes in v10: - Remove redundant changes to handle TCM in lockstep mode Changes in v9: - Introduce new API to request and release core1 TCM power-domains in lockstep mode. This will be used during prepare -> add_tcm_banks callback to enable TCM in lockstep mode. - Parse TCM from device-tree in lockstep mode and split mode in uniform way. - Fix TCM representation in device-tree in lockstep mode. Changes in v8: - Remove pm_domains framework - Remove checking of pm_domain_id validation to power on/off tcm - Remove spurious change - parse power-domains property from device-tree and use EEMI calls to power on/off TCM instead of using pm domains framework Changes in v7: - move checking of pm_domain_id from previous patch - fix mem_bank_data memory allocation drivers/remoteproc/xlnx_r5_remoteproc.c | 112 ++-- 1 file changed, 107 insertions(+), 5 deletions(-) diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c b/drivers/remoteproc/xlnx_r5_remoteproc.c index 42b0384d34f2..49e8eaf83fce 100644 --- a/drivers/remoteproc/xlnx_r5_remoteproc.c +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c @@ -74,8 +74,8 @@ struct mbox_info { }; /* - * Hardcoded TCM bank values. This will be removed once TCM bindings are - * accepted for system-dt specifications and upstreamed in linux kernel + * Hardcoded TCM bank values. This will stay in driver to maintain backward + * compatibility with device-tree that does not have TCM information. */ static const struct mem_bank_data zynqmp_tcm_banks_split[] = { {0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each */ @@ -757,6 +757,103 @@ static struct zynqmp_r5_core *zynqmp_r5_add_rproc_core(struct device *cdev) return ERR_PTR(ret); } +static int zynqmp_r5_get_tcm_node_from_dt(struct zynqmp_r5_cluster *cluster) +{ + int i, j, tcm_bank_count, ret, tcm_pd_idx, pd_count; + struct of_phandle_args out_args = {0}; + struct zynqmp_r5_core *r5_core; + struct platform_device *cpdev; + struct mem_bank_data *tcm; + struct device_node *np; + struct resource *res; + u64 abs_addr, size; + struct device *dev; + + for (i = 0; i < cluster->core_count; i++) { + r5_core = cluster->r5_cores[i]; + dev = r5_core->dev; + np = r5_core->np; + + pd_count = of_count_phandle_with_args(np, "power-domains", + "#power-domain-cells"); + + if (pd_count <= 0) { + dev_err(dev, "invalid power-domains property, %d\n", pd_count); + return -EINVAL; + } + + /* First entry in power-domains list is for r5 core, rest for TCM. */ + tcm_bank_count = pd_count - 1; + + if (tcm_bank_count <= 0) { + dev_err(dev, "invalid TCM count %d\n", tcm_bank_count); + return -EINVAL; + } + + r5_core->tcm_banks = devm_kcalloc(dev, tcm_bank_count, + sizeof(struct mem_bank_data *), + GFP_KERNEL); + if (!r5_core->tcm_banks) + ret = -ENOMEM; + + r5_core->tcm_bank_count = tcm_bank_count; + for (j = 0, tcm_pd_idx = 1; j < tcm_bank_count; j++, tcm_pd_idx++) { + tcm = devm_kzalloc(dev, sizeof(struct mem_bank_data), + GFP_KERNEL); + if (!tcm) + return -ENOMEM; + + r5_core->tcm_banks[j] = tcm; + + /* Get power-domains id of TCM. */ + ret = of_parse_phandle_with_args(np, "power-domains", +"#power-domain-cells", +tcm_pd_idx, &out_args); + if (ret) { + dev_err(r5_core->dev, + "failed to get tcm %d pm domain, ret %d\n", + tcm_pd_idx, ret); + return ret; + } + tcm->pm_domain_id = out_args.args[0]; + of_node_put(out_args.np); + + /* Get TCM address without translation. */ + ret = of_property_read_reg(np, j, &abs_addr, &size); + if (ret) { + dev_err(dev, "failed to get reg property\n"); + return ret; + } + +
[PATCH v10 0/4] add zynqmp TCM bindings
Tightly-Coupled Memories(TCMs) are low-latency memory that provides predictable instruction execution and predictable data load/store timing. Each Cortex-R5F processor contains exclusive two 64 KB memory banks on the ATCM and BTCM ports, for a total of 128 KB of memory. In lockstep mode, both 128KB memory is accessible to the cluster. As per ZynqMP Ultrascale+ Technical Reference Manual UG1085, following is address space of TCM memory. The bindings in this patch series introduces properties to accommodate following address space with address translation between Linux and Cortex-R5 views. | | | | | --- | --- | --- | | *Mode*| *R5 View* | *Linux view* | Notes | | *Split Mode* | *start addr*| *start addr* | | | R5_0 ATCM (64 KB) | 0x_ | 0xFFE0_ | | | R5_0 BTCM (64 KB) | 0x0002_ | 0xFFE2_ | | | R5_1 ATCM (64 KB) | 0x_ | 0xFFE9_ | alias of 0xFFE1_ | | R5_1 BTCM (64 KB) | 0x0002_ | 0xFFEB_ | alias of 0xFFE3_ | | ___ | ___ |___ | | | *Lockstep Mode*| | | | | R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_ | | | R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_ | | References: UG1085 TCM address space: https://docs.xilinx.com/r/en-US/ug1085-zynq-ultrascale-trm/Tightly-Coupled-Memory-Address-Map Changs in v10: - Add new patch (1/4) to series that changes hardcode TCM addresses in lockstep mode and removes separate handling of TCM in lockstep and split mode - modify number of "reg", "reg-names" and "power-domains" entries based on cluster mode - Add extra optional atcm and btcm in "reg" property for lockstep mode - Add "reg-names" for extra optional atcm and btcm for lockstep mode - Drop previous Ack as bindings has new change - Add individual tcm regions via "reg" and "reg-names" for lockstep mode - Add each tcm's power-domains in lockstep mode - Drop previous Ack as new change in dts patchset - Remove redundant changes in driver to handle TCM in lockstep mode Changes in v9: - Fix rproc lockstep dts - Introduce new API to request and release core1 TCM power-domains in lockstep mode. This will be used during prepare -> add_tcm_banks callback to enable TCM in lockstep mode. - Parse TCM from device-tree in lockstep mode and split mode in uniform way. - Fix TCM representation in device-tree in lockstep mode. - Fix comments as suggested Changes in v8: - Remove use of pm_domains framework - Remove checking of pm_domain_id validation to power on/off tcm - Remove spurious change - parse power-domains property from device-tree and use EEMI calls to power on/off TCM instead of using pm domains framework Changes in v7: - %s/pm_dev1/pm_dev_core0/r - %s/pm_dev_link1/pm_dev_core0_link/r - %s/pm_dev2/pm_dev_core1/r - %s/pm_dev_link2/pm_dev_core1_link/r - remove pm_domain_id check to move next patch - add comment about how 1st entry in pm domain list is used - fix loop when jump to fail_add_pm_domains loop - move checking of pm_domain_id from previous patch - fix mem_bank_data memory allocation Changes in v6: - Introduce new node entry for r5f cluster split mode dts and keep it disabled by default. - Keep remoteproc lockstep mode enabled by default to maintian back compatibility. - Enable split mode only for zcu102 board to demo split mode use - Remove spurious change - Handle errors in add_pm_domains function - Remove redundant code to handle errors from remove_pm_domains - Missing . at the end of the commit message - remove redundant initialization of variables - remove fail_tcm label and relevant code to free memory acquired using devm_* API. As this will be freed when device free it - add extra check to see if "reg" property is supported or not Changes in v5: - maintain Rob's Ack on bindings patch as no changes in bindings - split previous patch into multiple patches - Use pm domain framework to turn on/off TCM - Add support of parsing TCM information from device-tree - maintain backward compatibility with previous bindings without TCM information available in device-tree This patch series continues previous effort to upstream ZynqMP TCM bindings: Previous v4 version link: https://lore.kernel.org/all/20230829181900.2561194-1-tanmay.s...@amd.com/ Previous v3 version link: https://lore.kernel.org/all/1689964908-22371-1-git-send-email-radhey.shyam.pan...@amd.com/ Radhey Shyam Pandey (1): dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings Radhey Shyam Pandey (1): dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings Tanmay Shah (3): remoteproc: zynqmp: fix lockstep mode memory region dts: zynqmp: add properties for TCM in remoteproc remoteproc: zyn
[PATCH v10 1/4] remoteproc: zynqmp: fix lockstep mode memory region
In lockstep mode, r5 core0 uses TCM of R5 core1. Following is lockstep mode memory region as per hardware reference manual. | *TCM* | *R5 View* | *Linux view* | | R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_ | | R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_ | However, driver shouldn't model it as above because R5 core0 TCM and core1 TCM has different power-domains mapped to it. Hence, TCM address space in lockstep mode should be modeled as 64KB regions only where each region has its own power-domain as following: | *TCM* | *R5 View* | *Linux view* | | R5_0 ATCM0 (64 KB) | 0x_ | 0xFFE0_ | | R5_0 BTCM0 (64 KB) | 0x0002_ | 0xFFE2_ | | R5_0 ATCM1 (64 KB) | 0x0001_ | 0xFFE1_ | | R5_0 BTCM1 (64 KB) | 0x0003_ | 0xFFE3_ | This makes driver maintanance easy and makes design robust for future platorms as well. Signed-off-by: Tanmay Shah --- drivers/remoteproc/xlnx_r5_remoteproc.c | 145 ++-- 1 file changed, 12 insertions(+), 133 deletions(-) diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c b/drivers/remoteproc/xlnx_r5_remoteproc.c index 4395edea9a64..42b0384d34f2 100644 --- a/drivers/remoteproc/xlnx_r5_remoteproc.c +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c @@ -84,12 +84,12 @@ static const struct mem_bank_data zynqmp_tcm_banks_split[] = { {0xffebUL, 0x2, 0x1UL, PD_R5_1_BTCM, "btcm1"}, }; -/* In lockstep mode cluster combines each 64KB TCM and makes 128KB TCM */ +/* In lockstep mode cluster uses each 64KB TCM from second core as well */ static const struct mem_bank_data zynqmp_tcm_banks_lockstep[] = { - {0xffe0UL, 0x0, 0x2UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 128KB each */ - {0xffe2UL, 0x2, 0x2UL, PD_R5_0_BTCM, "btcm0"}, - {0, 0, 0, PD_R5_1_ATCM, ""}, - {0, 0, 0, PD_R5_1_BTCM, ""}, + {0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each */ + {0xffe2UL, 0x2, 0x1UL, PD_R5_0_BTCM, "btcm0"}, + {0xffe1UL, 0x1, 0x1UL, PD_R5_1_ATCM, "atcm1"}, + {0xffe3UL, 0x3, 0x1UL, PD_R5_1_BTCM, "btcm1"}, }; /** @@ -540,14 +540,14 @@ static int tcm_mem_map(struct rproc *rproc, } /* - * add_tcm_carveout_split_mode() + * add_tcm_banks() * @rproc: single R5 core's corresponding rproc instance * - * allocate and add remoteproc carveout for TCM memory in split mode + * allocate and add remoteproc carveout for TCM memory * * return 0 on success, otherwise non-zero value on failure */ -static int add_tcm_carveout_split_mode(struct rproc *rproc) +static int add_tcm_banks(struct rproc *rproc) { struct rproc_mem_entry *rproc_mem; struct zynqmp_r5_core *r5_core; @@ -580,10 +580,10 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc) ZYNQMP_PM_REQUEST_ACK_BLOCKING); if (ret < 0) { dev_err(dev, "failed to turn on TCM 0x%x", pm_domain_id); - goto release_tcm_split; + goto release_tcm; } - dev_dbg(dev, "TCM carveout split mode %s addr=%llx, da=0x%x, size=0x%lx", + dev_dbg(dev, "TCM carveout %s addr=%llx, da=0x%x, size=0x%lx", bank_name, bank_addr, da, bank_size); rproc_mem = rproc_mem_entry_init(dev, NULL, bank_addr, @@ -593,7 +593,7 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc) if (!rproc_mem) { ret = -ENOMEM; zynqmp_pm_release_node(pm_domain_id); - goto release_tcm_split; + goto release_tcm; } rproc_add_carveout(rproc, rproc_mem); @@ -601,7 +601,7 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc) return 0; -release_tcm_split: +release_tcm: /* If failed, Turn off all TCM banks turned on before */ for (i--; i >= 0; i--) { pm_domain_id = r5_core->tcm_banks[i]->pm_domain_id; @@ -610,127 +610,6 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc) return ret; } -/* - * add_tcm_carveout_lockstep_mode() - * @rproc: single R5 core's corresponding rproc instance - * - * allocate and add remoteproc carveout for TCM memory in lockstep mode - * - * return 0 on success, otherwise non-zero value on failure - */ -static int add_tcm_carveout_lockstep_mode(struct rproc *rproc) -{ - struct rproc_mem_entry *rproc_mem; - struct zynqmp_r5_core *r5_core; - int i, num_banks, ret; - phys_addr_t bank_addr; - size_t bank_size = 0; - struct device *dev; - u32 pm_domain_id; - char *bank_name; - u32 da; - - r5_core = rproc->priv; - dev = r5_core->dev; - - /* Go through zynqmp banks for r5 node */ - num_banks = r5_core->tcm_bank
[PATCH v2] tracing: Have saved_cmdlines arrays all in one allocation
From: "Steven Rostedt (Google)" The saved_cmdlines have three arrays for mapping PIDs to COMMs: - map_pid_to_cmdline[] - map_cmdline_to_pid[] - saved_cmdlines The map_pid_to_cmdline[] is PID_MAX_DEFAULT in size and holds the index into the other arrays. The map_cmdline_to_pid[] is a mapping back to the full pid as it can be larger than PID_MAX_DEFAULT. And the saved_cmdlines[] just holds the COMMs associated to the pids. Currently the map_pid_to_cmdline[] and saved_cmdlines[] are allocated together (in reality the saved_cmdlines is just in the memory of the rounding of the allocation of the structure as it is always allocated in powers of two). The map_cmdline_to_pid[] array is allocated separately. Since the rounding to a power of two is rather large (it allows for 8000 elements in saved_cmdlines), also include the map_cmdline_to_pid[] array. (This drops it to 6000 by default, which is still plenty for most use cases). This saves even more memory as the map_cmdline_to_pid[] array doesn't need to be allocated. Link: https://lore.kernel.org/linux-trace-kernel/20240212174011.06821...@gandalf.local.home/ Signed-off-by: Steven Rostedt (Google) --- Changes since v1: https://lore.kernel.org/linux-trace-kernel/20240212180941.379c4...@gandalf.local.home/ -- Added SAVED_CMDLINE_MAP_ELEMENT_SIZE helper macro. kernel/trace/trace_sched_switch.c | 17 - 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c index e4fbcc3bede5..5f3e9bc87079 100644 --- a/kernel/trace/trace_sched_switch.c +++ b/kernel/trace/trace_sched_switch.c @@ -175,6 +175,10 @@ struct saved_cmdlines_buffer { }; static struct saved_cmdlines_buffer *savedcmd; +/* Holds the size of a cmdline and pid element */ +#define SAVED_CMDLINE_MAP_ELEMENT_SIZE(s) \ + (TASK_COMM_LEN + sizeof((s)->map_cmdline_to_pid[0])) + static inline char *get_saved_cmdlines(int idx) { return &savedcmd->saved_cmdlines[idx * TASK_COMM_LEN]; @@ -201,7 +205,7 @@ static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val) int order; /* Figure out how much is needed to hold the given number of cmdlines */ - orig_size = sizeof(*s) + val * TASK_COMM_LEN; + orig_size = sizeof(*s) + val * SAVED_CMDLINE_MAP_ELEMENT_SIZE(s); order = get_order(orig_size); size = 1 << (order + PAGE_SHIFT); page = alloc_pages(GFP_KERNEL, order); @@ -212,16 +216,11 @@ static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val) memset(s, 0, sizeof(*s)); /* Round up to actual allocation */ - val = (size - sizeof(*s)) / TASK_COMM_LEN; + val = (size - sizeof(*s)) / SAVED_CMDLINE_MAP_ELEMENT_SIZE(s); s->cmdline_num = val; - s->map_cmdline_to_pid = kmalloc_array(val, - sizeof(*s->map_cmdline_to_pid), - GFP_KERNEL); - if (!s->map_cmdline_to_pid) { - free_saved_cmdlines_buffer(s); - return NULL; - } + /* Place map_cmdline_to_pid array right after saved_cmdlines */ + s->map_cmdline_to_pid = (unsigned *)&s->saved_cmdlines[val * TASK_COMM_LEN]; s->cmdline_idx = 0; memset(&s->map_pid_to_cmdline, NO_CMDLINE_MAP, -- 2.43.0
Re: [PATCH] tracing: Have saved_cmdlines arrays all in one allocation
On Mon, 2024-02-12 at 19:13 -0500, Steven Rostedt wrote: > On Mon, 12 Feb 2024 15:39:03 -0800 > Tim Chen wrote: > > > > diff --git a/kernel/trace/trace_sched_switch.c > > > b/kernel/trace/trace_sched_switch.c > > > index e4fbcc3bede5..210c74dcd016 100644 > > > --- a/kernel/trace/trace_sched_switch.c > > > +++ b/kernel/trace/trace_sched_switch.c > > > @@ -201,7 +201,7 @@ static struct saved_cmdlines_buffer > > > *allocate_cmdlines_buffer(unsigned int val) > > > int order; > > > > > > /* Figure out how much is needed to hold the given number of cmdlines */ > > > - orig_size = sizeof(*s) + val * TASK_COMM_LEN; > > > + orig_size = sizeof(*s) + val * (TASK_COMM_LEN + sizeof(int)); > > > > Strictly speaking, *map_cmdline_to_pid is unsigned int so it is more > > consistent > > to use sizeof(unsigned) in line above. But I'm nitpicking and I'm fine to > > leave it as is. > > I was thinking about making that into a macro as it is used in two places. > > /* Holds the size of a cmdline and pid element */ > #define SAVED_CMDLINE_MAP_ELEMENT_SIZE(s) \ > (TASK_COMM_LEN + sizeof((s)->map_cmdline_to_pid[0])) > > orig_size = sizeof(*s) + val * SAVED_CMDLINE_MAP_ELEMENT_SIZE(s); > > Looks good. This makes the code more readable. Tim
Re: [PATCH V2 3/3] vdpa_sim: flush workers on suspend
On Mon, Feb 12, 2024 at 6:16 PM Steve Sistare wrote: > > Flush to guarantee no workers are running when suspend returns. > > Signed-off-by: Steve Sistare > --- > drivers/vdpa/vdpa_sim/vdpa_sim.c | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c > b/drivers/vdpa/vdpa_sim/vdpa_sim.c > index be2925d0d283..a662b90357c3 100644 > --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c > @@ -74,6 +74,17 @@ static void vdpasim_worker_change_mm_sync(struct vdpasim > *vdpasim, > kthread_flush_work(work); > } > > +static void flush_work_fn(struct kthread_work *work) {} > + > +static void vdpasim_flush_work(struct vdpasim *vdpasim) > +{ > + struct kthread_work work; > + > + kthread_init_work(&work, flush_work_fn); If the work is already queued, doesn't it break the linked list because of the memset in kthread_init_work? > + kthread_queue_work(vdpasim->worker, &work); > + kthread_flush_work(&work); > +} > + > static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa) > { > return container_of(vdpa, struct vdpasim, vdpa); > @@ -511,6 +522,8 @@ static int vdpasim_suspend(struct vdpa_device *vdpa) > vdpasim->running = false; > mutex_unlock(&vdpasim->mutex); > > + vdpasim_flush_work(vdpasim); Do we need to protect the case where vdpasim_kick_vq and vdpasim_suspend are called "at the same time"? Correct userland should not be doing it but buggy or mailious could be. Just calling vdpasim_flush_work with the mutex acquired would solve the issue, doesn't it? Thanks! > + > return 0; > } > > -- > 2.39.3 >
Re: [PATCH V2 1/3] vhost-vdpa: flush workers on suspend
On Mon, Feb 12, 2024 at 6:16 PM Steve Sistare wrote: > > Flush to guarantee no workers are running when suspend returns. > > Signed-off-by: Steve Sistare Acked-by: Eugenio Pérez Should this have a Fixes tag? > --- > drivers/vhost/vdpa.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > index bc4a51e4638b..a3b986c24805 100644 > --- a/drivers/vhost/vdpa.c > +++ b/drivers/vhost/vdpa.c > @@ -594,10 +594,13 @@ static long vhost_vdpa_suspend(struct vhost_vdpa *v) > struct vdpa_device *vdpa = v->vdpa; > const struct vdpa_config_ops *ops = vdpa->config; > int ret; > + struct vhost_dev *vdev = &v->vdev; > > if (!ops->suspend) > return -EOPNOTSUPP; > > + vhost_dev_flush(vdev); > + > ret = ops->suspend(vdpa); > if (!ret) > v->suspended = true; > -- > 2.39.3 >
Re: [Linux-stm32] [PATCH v2 4/4] remoteproc: stm32: Add support of an OP-TEE TA to load the firmware
Hello Mathieu, On 2/5/24 10:13, Arnaud POULIQUEN wrote: > > > On 2/2/24 20:53, Mathieu Poirier wrote: >> On Thu, Feb 01, 2024 at 07:33:35PM +0100, Arnaud POULIQUEN wrote: >>> >>> >>> On 2/1/24 17:02, Mathieu Poirier wrote: On Thu, Feb 01, 2024 at 04:06:37PM +0100, Arnaud POULIQUEN wrote: > hello Mathieu, > > On 1/31/24 19:52, Mathieu Poirier wrote: >> On Tue, Jan 30, 2024 at 10:13:48AM +0100, Arnaud POULIQUEN wrote: >>> >>> >>> On 1/26/24 18:11, Mathieu Poirier wrote: On Thu, Jan 18, 2024 at 11:04:33AM +0100, Arnaud Pouliquen wrote: > The new TEE remoteproc device is used to manage remote firmware in a > secure, trusted context. The 'st,stm32mp1-m4-tee' compatibility is > introduced to delegate the loading of the firmware to the trusted > execution context. In such cases, the firmware should be signed and > adhere to the image format defined by the TEE. > > Signed-off-by: Arnaud Pouliquen > --- > V1 to V2 update: > - remove the select "TEE_REMOTEPROC" in STM32_RPROC config as > detected by > the kernel test robot: > WARNING: unmet direct dependencies detected for TEE_REMOTEPROC > Depends on [n]: REMOTEPROC [=y] && OPTEE [=n] > Selected by [y]: > - STM32_RPROC [=y] && (ARCH_STM32 || COMPILE_TEST [=y]) && > REMOTEPROC [=y] > - Fix initialized trproc variable in stm32_rproc_probe > --- > drivers/remoteproc/stm32_rproc.c | 149 > +-- > 1 file changed, 144 insertions(+), 5 deletions(-) > > diff --git a/drivers/remoteproc/stm32_rproc.c > b/drivers/remoteproc/stm32_rproc.c > index fcc0001e2657..cf6a21bac945 100644 > --- a/drivers/remoteproc/stm32_rproc.c > +++ b/drivers/remoteproc/stm32_rproc.c > @@ -20,6 +20,7 @@ > #include > #include > #include > +#include > #include > > #include "remoteproc_internal.h" > @@ -49,6 +50,9 @@ > #define M4_STATE_STANDBY 4 > #define M4_STATE_CRASH 5 > > +/* Remote processor unique identifier aligned with the Trusted > Execution Environment definitions */ > +#define STM32_MP1_M4_PROC_ID0 > + > struct stm32_syscon { > struct regmap *map; > u32 reg; > @@ -90,6 +94,8 @@ struct stm32_rproc { > struct stm32_mbox mb[MBOX_NB_MBX]; > struct workqueue_struct *workqueue; > bool hold_boot_smc; > + bool fw_loaded; > + struct tee_rproc *trproc; > void __iomem *rsc_va; > }; > > @@ -257,6 +263,91 @@ static int stm32_rproc_release(struct rproc > *rproc) > return err; > } > > +static int stm32_rproc_tee_elf_sanity_check(struct rproc *rproc, > + const struct firmware *fw) > +{ > + struct stm32_rproc *ddata = rproc->priv; > + unsigned int ret = 0; > + > + if (rproc->state == RPROC_DETACHED) > + return 0; > + > + ret = tee_rproc_load_fw(ddata->trproc, fw); > + if (!ret) > + ddata->fw_loaded = true; > + > + return ret; > +} > + > +static int stm32_rproc_tee_elf_load(struct rproc *rproc, > + const struct firmware *fw) > +{ > + struct stm32_rproc *ddata = rproc->priv; > + unsigned int ret; > + > + /* > + * This function can be called by remote proc for recovery > + * without the sanity check. In this case we need to load the > firmware > + * else nothing done here as the firmware has been preloaded > for the > + * sanity check to be able to parse it for the resource table. > + */ This comment is very confusing - please consider refactoring. > + if (ddata->fw_loaded) > + return 0; > + I'm not sure about keeping a flag to indicate the status of the loaded firmware. It is not done for the non-secure method, I don't see why it would be needed for the secure one. >>> >>> The difference is on the sanity check. >>> - in rproc_elf_sanity_check we parse the elf file to verify that it is >>> valid. >>> - in stm32_rproc_tee_elf_sanity_check we have to do the same, that >>> means to >>> authenticate it. the authentication is done during the load. >>> >>> So this flag is used to avoid
Re: [PATCH v2 2/4] dt-bindings: remoteproc: Add compatibility for TEE support
Hello Rob, On 1/30/24 18:51, Rob Herring wrote: > On Thu, Jan 18, 2024 at 11:04:31AM +0100, Arnaud Pouliquen wrote: >> The "st,stm32mp1-m4-tee" compatible is utilized in a system configuration >> where the Cortex-M4 firmware is loaded by the Trusted execution Environment >> (TEE). >> For instance, this compatible is used in both the Linux and OP-TEE >> device-tree: >> - In OP-TEE, a node is defined in the device tree with the >> st,stm32mp1-m4-tee to support signed remoteproc firmware. >> Based on DT properties, OP-TEE authenticates, loads, starts, and stops >> the firmware. >> - On Linux, when the compatibility is set, the Cortex-M resets should not >> be declared in the device tree. >> >> Signed-off-by: Arnaud Pouliquen >> --- >> V1 to V2 updates >> - update "st,stm32mp1-m4" compatible description to generalize >> - remove the 'reset-names' requirement in one conditional branch, as the >> property is already part of the condition test. >> --- >> .../bindings/remoteproc/st,stm32-rproc.yaml | 52 +++ >> 1 file changed, 43 insertions(+), 9 deletions(-) >> >> diff --git >> a/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml >> b/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml >> index 370af61d8f28..6af821b15736 100644 >> --- a/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml >> +++ b/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml >> @@ -16,7 +16,12 @@ maintainers: >> >> properties: >>compatible: >> -const: st,stm32mp1-m4 >> +enum: >> + - st,stm32mp1-m4 >> + - st,stm32mp1-m4-tee >> +description: >> + Use "st,stm32mp1-m4" for the Cortex-M4 coprocessor management by >> non-secure context >> + Use "st,stm32mp1-m4-tee" for the Cortex-M4 coprocessor management by >> secure context >> >>reg: >> description: >> @@ -142,21 +147,40 @@ properties: >> required: >>- compatible >>- reg >> - - resets >> >> allOf: >>- if: >>properties: >> -reset-names: >> - not: >> -contains: >> - const: hold_boot >> +compatible: >> + contains: >> +const: st,stm32mp1-m4 >> +then: >> + if: >> +properties: >> + reset-names: >> +not: >> + contains: >> +const: hold_boot > > Note that this is true when 'reset-names' is not present. If that is not > desired, then you need 'required: [reset-names]'. Not really a new issue > though. > Yes that corresponds to my expectation, for compatibility with legacy DT. If the hold_boot reset was not used, reset-names was not mandatory I will add the 'required: [reset-names]' in the else Thanks, Arnaud >> + then: >> +required: >> + - st,syscfg-holdboot >> + - resets >> + else: >> +properties: >> + st,syscfg-holdboot: false >> +required: >> + - resets > > 'resets' is always required within the outer 'then' schema, so you can > move this up a level. > >> + >> + - if: >> + properties: >> +compatible: >> + contains: >> +const: st,stm32mp1-m4-tee >> then: >> - required: >> -- st,syscfg-holdboot >> -else: >>properties: >> st,syscfg-holdboot: false >> +reset-names: false >> +resets: false >> >> additionalProperties: false >> >> @@ -188,5 +212,15 @@ examples: >>st,syscfg-rsc-tbl = <&tamp 0x144 0x>; >>st,syscfg-m4-state = <&tamp 0x148 0x>; >> }; >> + - | >> +#include >> +m4@1000 { >> + compatible = "st,stm32mp1-m4-tee"; >> + reg = <0x1000 0x4>, >> +<0x3000 0x4>, >> +<0x3800 0x1>; >> + st,syscfg-rsc-tbl = <&tamp 0x144 0x>; >> + st,syscfg-m4-state = <&tamp 0x148 0x>; >> +}; >> >> ... >> -- >> 2.25.1 >>
Re: [PATCH V2] vdpa_sim: reset must not run
Changes in V2: - test DRIVER_OK instead of FEATURES_OK - post singly, and add RB and acks - Steve On 2/9/2024 5:30 PM, Steve Sistare wrote: > vdpasim_do_reset sets running to true, which is wrong, as it allows > vdpasim_kick_vq to post work requests before the device has been > configured. To fix, do not set running until VIRTIO_CONFIG_S_DRIVER_OK > is set. > > Fixes: 0c89e2a3a9d0 ("vdpa_sim: Implement suspend vdpa op") > Signed-off-by: Steve Sistare > Reviewed-by: Eugenio Pérez > Acked-by: Jason Wang > --- > drivers/vdpa/vdpa_sim/vdpa_sim.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c > b/drivers/vdpa/vdpa_sim/vdpa_sim.c > index be2925d0d283..18584ce70bf0 100644 > --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c > @@ -160,7 +160,7 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim, u32 > flags) > } > } > > - vdpasim->running = true; > + vdpasim->running = false; > spin_unlock(&vdpasim->iommu_lock); > > vdpasim->features = 0; > @@ -483,6 +483,7 @@ static void vdpasim_set_status(struct vdpa_device *vdpa, > u8 status) > > mutex_lock(&vdpasim->mutex); > vdpasim->status = status; > + vdpasim->running = (status & VIRTIO_CONFIG_S_DRIVER_OK) != 0; > mutex_unlock(&vdpasim->mutex); > } >
Re: [PATCH V2 0/3] flush workers on suspend
On 2/12/2024 12:16 PM, Steve Sistare wrote: > Flush to guarantee no workers are running when suspend returns, > for vdpa, vpa_sim, and vduse. (mlx5 already does so, via the path > mlx5_vdpa_suspend -> unregister_link_notifier -> flush_workqueue.) Changes in V2: - renamed "vduse: suspend" (was vduse: flush workers on suspend) - call vhost_dev_flush unconditionally in "vhost-vdpa: flush workers on suspend" - Steve > Steve Sistare (3): > vhost-vdpa: flush workers on suspend > vduse: suspend > vdpa_sim: flush workers on suspend > > drivers/vdpa/vdpa_sim/vdpa_sim.c | 13 + > drivers/vdpa/vdpa_user/vduse_dev.c | 24 > drivers/vhost/vdpa.c | 3 +++ > 3 files changed, 40 insertions(+) >
Re: [PATCH V2] vdpa: skip suspend/resume ops if not DRIVER_OK
On 2/13/2024 9:58 AM, Eugenio Perez Martin wrote: > On Tue, Feb 13, 2024 at 3:26 PM Steve Sistare > wrote: >> >> If a vdpa device is not in state DRIVER_OK, then there is no driver state >> to preserve, so no need to call the suspend and resume driver ops. >> >> Suggested-by: Eugenio Perez Martin " >> Signed-off-by: Steve Sistare > > Reviewed-by: Eugenio Pérez > > Please include a changelog from previous versions for the next series. > > Thanks! Will do, thanks - steve >> --- >> drivers/vhost/vdpa.c | 6 ++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c >> index bc4a51e4638b..aef92a7c57f3 100644 >> --- a/drivers/vhost/vdpa.c >> +++ b/drivers/vhost/vdpa.c >> @@ -595,6 +595,9 @@ static long vhost_vdpa_suspend(struct vhost_vdpa *v) >> const struct vdpa_config_ops *ops = vdpa->config; >> int ret; >> >> + if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK)) >> + return 0; >> + >> if (!ops->suspend) >> return -EOPNOTSUPP; >> >> @@ -615,6 +618,9 @@ static long vhost_vdpa_resume(struct vhost_vdpa *v) >> const struct vdpa_config_ops *ops = vdpa->config; >> int ret; >> >> + if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK)) >> + return 0; >> + >> if (!ops->resume) >> return -EOPNOTSUPP; >> >> -- >> 2.39.3 >> >
Re: [PATCH V2] vdpa: skip suspend/resume ops if not DRIVER_OK
On Tue, Feb 13, 2024 at 3:26 PM Steve Sistare wrote: > > If a vdpa device is not in state DRIVER_OK, then there is no driver state > to preserve, so no need to call the suspend and resume driver ops. > > Suggested-by: Eugenio Perez Martin " > Signed-off-by: Steve Sistare Reviewed-by: Eugenio Pérez Please include a changelog from previous versions for the next series. Thanks! > --- > drivers/vhost/vdpa.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > index bc4a51e4638b..aef92a7c57f3 100644 > --- a/drivers/vhost/vdpa.c > +++ b/drivers/vhost/vdpa.c > @@ -595,6 +595,9 @@ static long vhost_vdpa_suspend(struct vhost_vdpa *v) > const struct vdpa_config_ops *ops = vdpa->config; > int ret; > > + if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK)) > + return 0; > + > if (!ops->suspend) > return -EOPNOTSUPP; > > @@ -615,6 +618,9 @@ static long vhost_vdpa_resume(struct vhost_vdpa *v) > const struct vdpa_config_ops *ops = vdpa->config; > int ret; > > + if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK)) > + return 0; > + > if (!ops->resume) > return -EOPNOTSUPP; > > -- > 2.39.3 >
Re: [PATCH v6.1.y-v4.19.y] vhost: use kzalloc() instead of kmalloc() followed by memset()
On Mon, Feb 05, 2024 at 10:49:37AM +0530, Ajay Kaher wrote: > From: Prathu Baronia > > From: Prathu Baronia > > commit 4d8df0f5f79f747d75a7d356d9b9ea40a4e4c8a9 upstream > > Use kzalloc() to allocate new zeroed out msg node instead of > memsetting a node allocated with kmalloc(). > > Signed-off-by: Prathu Baronia > Message-Id: <20230522085019.42914-1-prathubaronia2...@gmail.com> > Signed-off-by: Michael S. Tsirkin > Reviewed-by: Stefano Garzarella > [Ajay: This is a security fix as per CVE-2024-0340] And this is why I am so grumpy about Red Hat and CVEs, they know how to let us know about stuff like this, but no. Hopefully, someday soon, they will soon not be allowed to do this anymore. {sigh} Now queued up, thanks. greg k-h
[PATCH V2] vdpa: skip suspend/resume ops if not DRIVER_OK
If a vdpa device is not in state DRIVER_OK, then there is no driver state to preserve, so no need to call the suspend and resume driver ops. Suggested-by: Eugenio Perez Martin " Signed-off-by: Steve Sistare --- drivers/vhost/vdpa.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index bc4a51e4638b..aef92a7c57f3 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -595,6 +595,9 @@ static long vhost_vdpa_suspend(struct vhost_vdpa *v) const struct vdpa_config_ops *ops = vdpa->config; int ret; + if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK)) + return 0; + if (!ops->suspend) return -EOPNOTSUPP; @@ -615,6 +618,9 @@ static long vhost_vdpa_resume(struct vhost_vdpa *v) const struct vdpa_config_ops *ops = vdpa->config; int ret; + if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK)) + return 0; + if (!ops->resume) return -EOPNOTSUPP; -- 2.39.3
Re: [PATCH net-next v3] net: dqs: add NIC stall detector based on BQL
On Fri, Feb 2, 2024 at 5:55 PM Breno Leitao wrote: > > From: Jakub Kicinski > > softnet_data->time_squeeze is sometimes used as a proxy for > host overload or indication of scheduling problems. In practice > this statistic is very noisy and has hard to grasp units - > e.g. is 10 squeezes a second to be expected, or high? > > Delaying network (NAPI) processing leads to drops on NIC queues > but also RTT bloat, impacting pacing and CA decisions. > Stalls are a little hard to detect on the Rx side, because > there may simply have not been any packets received in given > period of time. Packet timestamps help a little bit, but > again we don't know if packets are stale because we're > not keeping up or because someone (*cough* cgroups) > disabled IRQs for a long time. Please note that adding other sysfs entries is expensive for workloads creating/deleting netdev and netns often. I _think_ we should find a way for not creating /sys/class/net//queues/tx-{Q}/byte_queue_limits directory and files for non BQL enabled devices (like loopback !)
[PATCH v1] virtio: reenable config if freezing device failed
Currently, we don't reenable the config if freezing the device failed. For example, virtio-mem currently doesn't support suspend+resume, and trying to freeze the device will always fail. Afterwards, the device will no longer respond to resize requests, because it won't get notified about config changes. Let's fix this by re-enabling the config if freezing fails. Fixes: 22b7050a024d ("virtio: defer config changed notifications") Cc: Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Xuan Zhuo Signed-off-by: David Hildenbrand --- drivers/virtio/virtio.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index f4080692b351..f513ee21b1c1 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -510,8 +510,10 @@ int virtio_device_freeze(struct virtio_device *dev) if (drv && drv->freeze) { ret = drv->freeze(dev); - if (ret) + if (ret) { + virtio_config_enable(dev); return ret; + } } if (dev->config->destroy_avq) base-commit: c664e16bb1ba1c8cf1d7ecf3df5fd83bbb8ac15a -- 2.43.0
Re: [PATCH net-next v3] net: dqs: add NIC stall detector based on BQL
On Tue, Feb 06, 2024 at 12:40:13PM +0100, Paolo Abeni wrote: > On Fri, 2024-02-02 at 08:53 -0800, Breno Leitao wrote: > > From: Jakub Kicinski > > > > softnet_data->time_squeeze is sometimes used as a proxy for > > host overload or indication of scheduling problems. In practice > > this statistic is very noisy and has hard to grasp units - > > e.g. is 10 squeezes a second to be expected, or high? > > > > Delaying network (NAPI) processing leads to drops on NIC queues > > but also RTT bloat, impacting pacing and CA decisions. > > Stalls are a little hard to detect on the Rx side, because > > there may simply have not been any packets received in given > > period of time. Packet timestamps help a little bit, but > > again we don't know if packets are stale because we're > > not keeping up or because someone (*cough* cgroups) > > disabled IRQs for a long time. > > > > We can, however, use Tx as a proxy for Rx stalls. Most drivers > > use combined Rx+Tx NAPIs so if Tx gets starved so will Rx. > > On the Tx side we know exactly when packets get queued, > > and completed, so there is no uncertainty. > > > > This patch adds stall checks to BQL. Why BQL? Because > > it's a convenient place to add such checks, already > > called by most drivers, and it has copious free space > > in its structures (this patch adds no extra cache > > references or dirtying to the fast path). > > > > The algorithm takes one parameter - max delay AKA stall > > threshold and increments a counter whenever NAPI got delayed > > for at least that amount of time. It also records the length > > of the longest stall. > > > > To be precise every time NAPI has not polled for at least > > stall thrs we check if there were any Tx packets queued > > between last NAPI run and now - stall_thrs/2. > > > > Unlike the classic Tx watchdog this mechanism does not > > ignore stalls caused by Tx being disabled, or loss of link. > > I don't think the check is worth the complexity, and > > stall is a stall, whether due to host overload, flow > > control, link down... doesn't matter much to the application. > > > > We have been running this detector in production at Meta > > for 2 years, with the threshold of 8ms. It's the lowest > > value where false positives become rare. There's still > > a constant stream of reported stalls (especially without > > the ksoftirqd deferral patches reverted), those who like > > their stall metrics to be 0 may prefer higher value. > > > > Signed-off-by: Jakub Kicinski > > Signed-off-by: Breno Leitao > > --- > > Changelog: > > > > v1: > > * https://lore.kernel.org/netdev/202306172057.jx7yhliu-...@intel.com/T/ > > > > v2: > > * Fix the documentation file in patch 0001, since patch 0002 will > > touch it later. > > * Fix the kernel test robot issues, marking functions as statics. > > * Use #include instead of . > > * Added some new comments around, mainly around barriers. > > * Format struct `netdev_queue_attribute` assignments to make > > checkpatch happy. > > * Updated and fixed the path in sysfs-class-net-queues > > documentation. > > > > v3: > > * Sending patch 0002 against net-next. > > - The first patch was accepted against 'net' > > > > --- > > .../ABI/testing/sysfs-class-net-queues| 23 +++ > > include/linux/dynamic_queue_limits.h | 35 +++ > > include/trace/events/napi.h | 33 ++ > > lib/dynamic_queue_limits.c| 58 + > > net/core/net-sysfs.c | 62 +++ > > 5 files changed, 211 insertions(+) > > > > diff --git a/Documentation/ABI/testing/sysfs-class-net-queues > > b/Documentation/ABI/testing/sysfs-class-net-queues > > index 5bff64d256c2..45bab9b6e3ae 100644 > > --- a/Documentation/ABI/testing/sysfs-class-net-queues > > +++ b/Documentation/ABI/testing/sysfs-class-net-queues > > @@ -96,3 +96,26 @@ Description: > > Indicates the absolute minimum limit of bytes allowed to be > > queued on this network device transmit queue. Default value is > > 0. > > + > > +What: > > /sys/class/net//queues/tx-/byte_queue_limits/stall_thrs > > +Date: Jan 2024 > > +KernelVersion: 6.9 > > +Contact: net...@vger.kernel.org > > +Description: > > + Tx completion stall detection threshold. > > Possibly worth mentioning it's in milliseconds > > > Kernel will guarantee > > + to detect all stalls longer than this threshold but may also > > + detect stalls longer than half of the threshold. > > + > > +What: > > /sys/class/net//queues/tx-/byte_queue_limits/stall_cnt > > +Date: Jan 2024 > > +KernelVersion: 6.9 > > +Contact: net...@vger.kernel.org > > +Description: > > + Number of detected Tx completion stalls. > > + > > +What: > > /sys/class/net//queues/tx-/byte_queue_limits/stall_max > > +Date: Jan 2024 > >
[PATCH] tracing: Fix HAVE_DYNAMIC_FTRACE_WITH_REGS ifdef
Commit a8b9cf62ade1 ("ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default") attempted to fix an issue with direct trampolines on x86, see its description for details. However, it wrongly referenced the HAVE_DYNAMIC_FTRACE_WITH_REGS config option and the problem is still present. Add the missing "CONFIG_" prefix for the logic to work as intended. Fixes: a8b9cf62ade1 ("ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default") Signed-off-by: Petr Pavlu --- kernel/trace/ftrace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index c060d5b47910..83ba342aef31 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -5331,7 +5331,7 @@ static int register_ftrace_function_nolock(struct ftrace_ops *ops); * not support ftrace_regs_caller but direct_call, use SAVE_ARGS so that it * jumps from ftrace_caller for multiple ftrace_ops. */ -#ifndef HAVE_DYNAMIC_FTRACE_WITH_REGS +#ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS #define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_ARGS) #else #define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS) base-commit: 841c35169323cd833294798e58b9bf63fa4fa1de -- 2.35.3
Re: [PATCH v2 0/3] Add RPMPD support for MSM8974
On Sat, 10 Feb 2024 at 17:39, Luca Weiss wrote: > > Add driver support for the RPM power domains found on the different > MSM8974 devices. > > Devicetree integration will come at a later point since also some > mostly remoteproc drivers need to be adjusted. > > Also the MX power domains on this SoC seems to work quite a bit > differently, we'd need to send raw voltages to it, so these are ignored > in this series. > > Signed-off-by: Luca Weiss Applied for next, thanks! Note that patch1 is also available at the immutable dt branch, if soc maintainers need to pull it in too. Kind regards Uffe > --- > Changes in v2: > - Drop MSM8974_VDDGFX_AO in all patches > - Link to v1: > https://lore.kernel.org/r/20240210-msm8974-rpmpd-v1-0-de9355e68...@z3ntu.xyz > > --- > Luca Weiss (3): > dt-bindings: power: rpmpd: Add MSM8974 power domains > pmdomain: qcom: rpmpd: Add MSM8974+PM8841 power domains > pmdomain: qcom: rpmpd: Add MSM8974PRO+PMA8084 power domains > > .../devicetree/bindings/power/qcom,rpmpd.yaml | 2 + > drivers/pmdomain/qcom/rpmpd.c | 83 > ++ > include/dt-bindings/power/qcom-rpmpd.h | 7 ++ > 3 files changed, 92 insertions(+) > --- > base-commit: 6e3fa474051f3d276ea708bdb8e8e1f66d1d3ee5 > change-id: 20240210-msm8974-rpmpd-6e48fe374275 > > Best regards, > -- > Luca Weiss >
[PATCH v17 5/6] Documentation: tracing: Add ring-buffer mapping
It is now possible to mmap() a ring-buffer to stream its content. Add some documentation and a code example. Signed-off-by: Vincent Donnefort diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst index 5092d6c13af5..0b300901fd75 100644 --- a/Documentation/trace/index.rst +++ b/Documentation/trace/index.rst @@ -29,6 +29,7 @@ Linux Tracing Technologies timerlat-tracer intel_th ring-buffer-design + ring-buffer-map stm sys-t coresight/index diff --git a/Documentation/trace/ring-buffer-map.rst b/Documentation/trace/ring-buffer-map.rst new file mode 100644 index ..0426ab4bcf3d --- /dev/null +++ b/Documentation/trace/ring-buffer-map.rst @@ -0,0 +1,106 @@ +.. SPDX-License-Identifier: GPL-2.0 + +== +Tracefs ring-buffer memory mapping +== + +:Author: Vincent Donnefort + +Overview + +Tracefs ring-buffer memory map provides an efficient method to stream data +as no memory copy is necessary. The application mapping the ring-buffer becomes +then a consumer for that ring-buffer, in a similar fashion to trace_pipe. + +Memory mapping setup + +The mapping works with a mmap() of the trace_pipe_raw interface. + +The first system page of the mapping contains ring-buffer statistics and +description. It is referred as the meta-page. One of the most important field of +the meta-page is the reader. It contains the sub-buffer ID which can be safely +read by the mapper (see ring-buffer-design.rst). + +The meta-page is followed by all the sub-buffers, ordered by ascendant ID. It is +therefore effortless to know where the reader starts in the mapping: + +.. code-block:: c + +reader_id = meta->reader->id; +reader_offset = meta->meta_page_size + reader_id * meta->subbuf_size; + +When the application is done with the current reader, it can get a new one using +the trace_pipe_raw ioctl() TRACE_MMAP_IOCTL_GET_READER. This ioctl also updates +the meta-page fields. + +Limitations +=== +When a mapping is in place on a Tracefs ring-buffer, it is not possible to +either resize it (either by increasing the entire size of the ring-buffer or +each subbuf). It is also not possible to use snapshot and causes splice to copy +the ring buffer data instead of using the copyless swap from the ring buffer. + +Concurrent readers (either another application mapping that ring-buffer or the +kernel with trace_pipe) are allowed but not recommended. They will compete for +the ring-buffer and the output is unpredictable, just like concurrent readers on +trace_pipe would be. + +Example +=== + +.. code-block:: c + +#include +#include +#include +#include + +#include + +#include +#include + +#define TRACE_PIPE_RAW "/sys/kernel/tracing/per_cpu/cpu0/trace_pipe_raw" + +int main(void) +{ +int page_size = getpagesize(), fd, reader_id; +unsigned long meta_len, data_len; +struct trace_buffer_meta *meta; +void *map, *reader, *data; + +fd = open(TRACE_PIPE_RAW, O_RDONLY | O_NONBLOCK); +if (fd < 0) +exit(EXIT_FAILURE); + +map = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0); +if (map == MAP_FAILED) +exit(EXIT_FAILURE); + +meta = (struct trace_buffer_meta *)map; +meta_len = meta->meta_page_size; + +printf("entries:%llu\n", meta->entries); +printf("overrun:%llu\n", meta->overrun); +printf("read: %llu\n", meta->read); +printf("nr_subbufs: %u\n", meta->nr_subbufs); + +data_len = meta->subbuf_size * meta->nr_subbufs; +data = mmap(NULL, data_len, PROT_READ, MAP_SHARED, fd, meta_len); +if (data == MAP_FAILED) +exit(EXIT_FAILURE); + +if (ioctl(fd, TRACE_MMAP_IOCTL_GET_READER) < 0) +exit(EXIT_FAILURE); + +reader_id = meta->reader.id; +reader = data + meta->subbuf_size * reader_id; + +printf("Current reader address: %p\n", reader); + +munmap(data, data_len); +munmap(meta, meta_len); +close (fd); + +return 0; +} -- 2.43.0.687.g38aa6559b0-goog
[PATCH v17 4/6] tracing: Allow user-space mapping of the ring-buffer
Currently, user-space extracts data from the ring-buffer via splice, which is handy for storage or network sharing. However, due to splice limitations, it is imposible to do real-time analysis without a copy. A solution for that problem is to let the user-space map the ring-buffer directly. The mapping is exposed via the per-CPU file trace_pipe_raw. The first element of the mapping is the meta-page. It is followed by each subbuffer constituting the ring-buffer, ordered by their unique page ID: * Meta-page -- include/uapi/linux/trace_mmap.h for a description * Subbuf ID 0 * Subbuf ID 1 ... It is therefore easy to translate a subbuf ID into an offset in the mapping: reader_id = meta->reader->id; reader_offset = meta->meta_page_size + reader_id * meta->subbuf_size; When new data is available, the mapper must call a newly introduced ioctl: TRACE_MMAP_IOCTL_GET_READER. This will update the Meta-page reader ID to point to the next reader containing unread data. Mapping will prevent snapshot and buffer size modifications. Signed-off-by: Vincent Donnefort diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h index ffcd8dfcaa4f..d25b9d504a7c 100644 --- a/include/uapi/linux/trace_mmap.h +++ b/include/uapi/linux/trace_mmap.h @@ -43,4 +43,6 @@ struct trace_buffer_meta { __u64 Reserved2; }; +#define TRACE_MMAP_IOCTL_GET_READER_IO('T', 0x1) + #endif /* _TRACE_MMAP_H_ */ diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 4ebf4d0bd14c..c9e513a95cb7 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1175,6 +1175,12 @@ static void tracing_snapshot_instance_cond(struct trace_array *tr, return; } + if (tr->mapped) { + trace_array_puts(tr, "*** BUFFER MEMORY MAPPED ***\n"); + trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n"); + return; + } + local_irq_save(flags); update_max_tr(tr, current, smp_processor_id(), cond_data); local_irq_restore(flags); @@ -1307,7 +1313,7 @@ static int tracing_arm_snapshot_locked(struct trace_array *tr) lockdep_assert_held(&trace_types_lock); spin_lock(&tr->snapshot_trigger_lock); - if (tr->snapshot == UINT_MAX) { + if (tr->snapshot == UINT_MAX || tr->mapped) { spin_unlock(&tr->snapshot_trigger_lock); return -EBUSY; } @@ -6533,7 +6539,7 @@ static void tracing_set_nop(struct trace_array *tr) { if (tr->current_trace == &nop_trace) return; - + tr->current_trace->enabled--; if (tr->current_trace->reset) @@ -8652,15 +8658,31 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, return ret; } -/* An ioctl call with cmd 0 to the ring buffer file will wake up all waiters */ static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { struct ftrace_buffer_info *info = file->private_data; struct trace_iterator *iter = &info->iter; + int err; + + if (cmd == TRACE_MMAP_IOCTL_GET_READER) { + if (!(file->f_flags & O_NONBLOCK)) { + err = ring_buffer_wait(iter->array_buffer->buffer, + iter->cpu_file, + iter->tr->buffer_percent); + if (err) + return err; + } - if (cmd) - return -ENOIOCTLCMD; + return ring_buffer_map_get_reader(iter->array_buffer->buffer, + iter->cpu_file); + } else if (cmd) { + return -ENOTTY; + } + /* +* An ioctl call with cmd 0 to the ring buffer file will wake up all +* waiters +*/ mutex_lock(&trace_types_lock); iter->wait_index++; @@ -8673,6 +8695,110 @@ static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned return 0; } +static vm_fault_t tracing_buffers_mmap_fault(struct vm_fault *vmf) +{ + struct ftrace_buffer_info *info = vmf->vma->vm_file->private_data; + struct trace_iterator *iter = &info->iter; + vm_fault_t ret = VM_FAULT_SIGBUS; + struct page *page; + + page = ring_buffer_map_fault(iter->array_buffer->buffer, iter->cpu_file, +vmf->pgoff); + if (!page) + return ret; + + get_page(page); + vmf->page = page; + vmf->page->mapping = vmf->vma->vm_file->f_mapping; + vmf->page->index = vmf->pgoff; + + return 0; +} + +#ifdef CONFIG_TRACER_MAX_TRACE +static int get_snapshot_map(struct trace_array *tr) +{ + int err = 0; + + /* +* Called with mmap_lock held. lockdep would be unhappy if we would now +* take trace_types_lock. Instead use the specific +* s
[PATCH v17 3/6] tracing: Add snapshot refcount
When a ring-buffer is memory mapped by user-space, no trace or ring-buffer swap is possible. This means the snapshot feature is mutually exclusive with the memory mapping. Having a refcount on snapshot users will help to know if a mapping is possible or not. Instead of relying on the global trace_types_lock, a new spinlock is introduced to serialize accesses to trace_array->snapshot. This intends to allow access to that variable in a context where the mmap lock is already held. Signed-off-by: Vincent Donnefort diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 2a7c6fd934e9..4ebf4d0bd14c 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1300,6 +1300,50 @@ static void free_snapshot(struct trace_array *tr) tr->allocated_snapshot = false; } +static int tracing_arm_snapshot_locked(struct trace_array *tr) +{ + int ret; + + lockdep_assert_held(&trace_types_lock); + + spin_lock(&tr->snapshot_trigger_lock); + if (tr->snapshot == UINT_MAX) { + spin_unlock(&tr->snapshot_trigger_lock); + return -EBUSY; + } + + tr->snapshot++; + spin_unlock(&tr->snapshot_trigger_lock); + + ret = tracing_alloc_snapshot_instance(tr); + if (ret) { + spin_lock(&tr->snapshot_trigger_lock); + tr->snapshot--; + spin_unlock(&tr->snapshot_trigger_lock); + } + + return ret; +} + +int tracing_arm_snapshot(struct trace_array *tr) +{ + int ret; + + mutex_lock(&trace_types_lock); + ret = tracing_arm_snapshot_locked(tr); + mutex_unlock(&trace_types_lock); + + return ret; +} + +void tracing_disarm_snapshot(struct trace_array *tr) +{ + spin_lock(&tr->snapshot_trigger_lock); + if (!WARN_ON(!tr->snapshot)) + tr->snapshot--; + spin_unlock(&tr->snapshot_trigger_lock); +} + /** * tracing_alloc_snapshot - allocate snapshot buffer. * @@ -1373,10 +1417,6 @@ int tracing_snapshot_cond_enable(struct trace_array *tr, void *cond_data, mutex_lock(&trace_types_lock); - ret = tracing_alloc_snapshot_instance(tr); - if (ret) - goto fail_unlock; - if (tr->current_trace->use_max_tr) { ret = -EBUSY; goto fail_unlock; @@ -1395,6 +1435,10 @@ int tracing_snapshot_cond_enable(struct trace_array *tr, void *cond_data, goto fail_unlock; } + ret = tracing_arm_snapshot_locked(tr); + if (ret) + goto fail_unlock; + local_irq_disable(); arch_spin_lock(&tr->max_lock); tr->cond_snapshot = cond_snapshot; @@ -1439,6 +1483,8 @@ int tracing_snapshot_cond_disable(struct trace_array *tr) arch_spin_unlock(&tr->max_lock); local_irq_enable(); + tracing_disarm_snapshot(tr); + return ret; } EXPORT_SYMBOL_GPL(tracing_snapshot_cond_disable); @@ -1481,6 +1527,7 @@ int tracing_snapshot_cond_disable(struct trace_array *tr) } EXPORT_SYMBOL_GPL(tracing_snapshot_cond_disable); #define free_snapshot(tr) do { } while (0) +#define tracing_arm_snapshot_locked(tr) ({ -EBUSY; }) #endif /* CONFIG_TRACER_SNAPSHOT */ void tracer_tracing_off(struct trace_array *tr) @@ -6593,11 +6640,12 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf) */ synchronize_rcu(); free_snapshot(tr); + tracing_disarm_snapshot(tr); } - if (t->use_max_tr && !tr->allocated_snapshot) { - ret = tracing_alloc_snapshot_instance(tr); - if (ret < 0) + if (t->use_max_tr) { + ret = tracing_arm_snapshot_locked(tr); + if (ret) goto out; } #else @@ -6606,8 +6654,13 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf) if (t->init) { ret = tracer_init(t, tr); - if (ret) + if (ret) { +#ifdef CONFIG_TRACER_MAX_TRACE + if (t->use_max_tr) + tracing_disarm_snapshot(tr); +#endif goto out; + } } tr->current_trace = t; @@ -7709,10 +7762,11 @@ tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt, if (tr->allocated_snapshot) ret = resize_buffer_duplicate_size(&tr->max_buffer, &tr->array_buffer, iter->cpu_file); - else - ret = tracing_alloc_snapshot_instance(tr); - if (ret < 0) + + ret = tracing_arm_snapshot_locked(tr); + if (ret) break; + /* Now, we're going to swap */ if (iter->cpu_file == RING_BUFFER_ALL_CPUS) { local_irq_disable(); @@ -7722,6 +7776,7 @@ tracing_snapshot_write(struct file *filp, const cha
[PATCH v17 2/6] ring-buffer: Introducing ring-buffer mapping functions
In preparation for allowing the user-space to map a ring-buffer, add a set of mapping functions: ring_buffer_{map,unmap}() ring_buffer_map_fault() And controls on the ring-buffer: ring_buffer_map_get_reader() /* swap reader and head */ Mapping the ring-buffer also involves: A unique ID for each subbuf of the ring-buffer, currently they are only identified through their in-kernel VA. A meta-page, where are stored ring-buffer statistics and a description for the current reader The linear mapping exposes the meta-page, and each subbuf of the ring-buffer, ordered following their unique ID, assigned during the first mapping. Once mapped, no subbuf can get in or out of the ring-buffer: the buffer size will remain unmodified and the splice enabling functions will in reality simply memcpy the data instead of swapping subbufs. Signed-off-by: Vincent Donnefort diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index fa802db216f9..0841ba8bab14 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -6,6 +6,8 @@ #include #include +#include + struct trace_buffer; struct ring_buffer_iter; @@ -221,4 +223,9 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node); #define trace_rb_cpu_prepare NULL #endif +int ring_buffer_map(struct trace_buffer *buffer, int cpu); +int ring_buffer_unmap(struct trace_buffer *buffer, int cpu); +struct page *ring_buffer_map_fault(struct trace_buffer *buffer, int cpu, + unsigned long pgoff); +int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu); #endif /* _LINUX_RING_BUFFER_H */ diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h new file mode 100644 index ..ffcd8dfcaa4f --- /dev/null +++ b/include/uapi/linux/trace_mmap.h @@ -0,0 +1,46 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _TRACE_MMAP_H_ +#define _TRACE_MMAP_H_ + +#include + +/** + * struct trace_buffer_meta - Ring-buffer Meta-page description + * @meta_page_size:Size of this meta-page. + * @meta_struct_len: Size of this structure. + * @subbuf_size: Size of each sub-buffer. + * @nr_subbufs:Number of subbfs in the ring-buffer, including the reader. + * @reader.lost_events:Number of events lost at the time of the reader swap. + * @reader.id: subbuf ID of the current reader. ID range [0 : @nr_subbufs - 1] + * @reader.read: Number of bytes read on the reader subbuf. + * @flags: Placeholder for now, 0 until new features are supported. + * @entries: Number of entries in the ring-buffer. + * @overrun: Number of entries lost in the ring-buffer. + * @read: Number of entries that have been read. + * @Reserved1: Reserved for future use. + * @Reserved2: Reserved for future use. + */ +struct trace_buffer_meta { + __u32 meta_page_size; + __u32 meta_struct_len; + + __u32 subbuf_size; + __u32 nr_subbufs; + + struct { + __u64 lost_events; + __u32 id; + __u32 read; + } reader; + + __u64 flags; + + __u64 entries; + __u64 overrun; + __u64 read; + + __u64 Reserved1; + __u64 Reserved2; +}; + +#endif /* _TRACE_MMAP_H_ */ diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index ca796675c0a1..8d4f4b88cba2 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -338,6 +339,7 @@ struct buffer_page { local_t entries; /* entries on this page */ unsigned longreal_end; /* real end of data */ unsigned order; /* order of the page */ + u32 id;/* ID for external mapping */ struct buffer_data_page *page; /* Actual data page */ }; @@ -484,6 +486,12 @@ struct ring_buffer_per_cpu { u64 read_stamp; /* pages removed since last reset */ unsigned long pages_removed; + + unsigned intmapped; + struct mutexmapping_lock; + unsigned long *subbuf_ids;/* ID to subbuf addr */ + struct trace_buffer_meta*meta_page; + /* ring buffer pages to update, > 0 to add, < 0 to remove */ longnr_pages_to_update; struct list_headnew_pages; /* new pages to add */ @@ -1548,6 +1556,7 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu) init_irq_work(&cpu_buffer->irq_work.work, rb_wake_up_waiters); init_waitqueue_head(&cpu_buffer->irq_work.waiters); init_waitqueue_head(&cpu_buffer->irq_work.full_w
[PATCH v17 1/6] ring-buffer: Zero ring-buffer sub-buffers
In preparation for the ring-buffer memory mapping where each subbuf will be accessible to user-space, zero all the page allocations. Signed-off-by: Vincent Donnefort Reviewed-by: Masami Hiramatsu (Google) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index fd4bfe3ecf01..ca796675c0a1 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1472,7 +1472,8 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, list_add(&bpage->list, pages); - page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu), mflags, + page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu), + mflags | __GFP_ZERO, cpu_buffer->buffer->subbuf_order); if (!page) goto free_pages; @@ -1557,7 +1558,8 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu) cpu_buffer->reader_page = bpage; - page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL, cpu_buffer->buffer->subbuf_order); + page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL | __GFP_ZERO, + cpu_buffer->buffer->subbuf_order); if (!page) goto fail_free_reader; bpage->page = page_address(page); @@ -5525,7 +5527,8 @@ ring_buffer_alloc_read_page(struct trace_buffer *buffer, int cpu) if (bpage->data) goto out; - page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL | __GFP_NORETRY, + page = alloc_pages_node(cpu_to_node(cpu), + GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO, cpu_buffer->buffer->subbuf_order); if (!page) { kfree(bpage); -- 2.43.0.687.g38aa6559b0-goog
[PATCH v17 0/6] Introducing trace buffer mapping by user-space
The tracing ring-buffers can be stored on disk or sent to network without any copy via splice. However the later doesn't allow real time processing of the traces. A solution is to give userspace direct access to the ring-buffer pages via a mapping. An application can now become a consumer of the ring-buffer, in a similar fashion to what trace_pipe offers. Support for this new feature can already be found in libtracefs from version 1.8, when built with EXTRA_CFLAGS=-DFORCE_MMAP_ENABLE. Vincent v16 -> v17: * Documentation and comments improvements. * Create get/put_snapshot_map() for clearer code. * Replace kzalloc with kcalloc. * Fix -ENOMEM handling in rb_alloc_meta_page(). * Move flush(cpu_buffer->reader_page) behind the reader lock. * Move all inc/dec of cpu_buffer->mapped behind reader lock and buffer mutex. (removes READ_ONCE/WRITE_ONCE accesses). v15 -> v16: * Add comment for the dcache flush. * Remove now unnecessary WRITE_ONCE for the meta-page. v14 -> v15: * Add meta-page and reader-page flush. Intends to fix the mapping for VIVT and aliasing-VIPT data caches. * -EPERM on VM_EXEC. * Fix build warning !CONFIG_TRACER_MAX_TRACE. v13 -> v14: * All cpu_buffer->mapped readers use READ_ONCE (except for swap_cpu) * on unmap, sync meta-page teardown with the reader_lock instead of the synchronize_rcu. * Add a dedicated spinlock for trace_array ->snapshot and ->mapped. (intends to fix a lockdep issue) * Add kerneldoc for flags and Reserved fields. * Add kselftest for snapshot/map mutual exclusion. v12 -> v13: * Swap subbufs_{touched,lost} for Reserved fields. * Add a flag field in the meta-page. * Fix CONFIG_TRACER_MAX_TRACE. * Rebase on top of trace/urgent. * Add a comment for try_unregister_trigger() v11 -> v12: * Fix code sample mmap bug. * Add logging in sample code. * Reset tracer in selftest. * Add a refcount for the snapshot users. * Prevent mapping when there are snapshot users and vice versa. * Refine the meta-page. * Fix types in the meta-page. * Collect Reviewed-by. v10 -> v11: * Add Documentation and code sample. * Add a selftest. * Move all the update to the meta-page into a single rb_update_meta_page(). * rb_update_meta_page() is now called from ring_buffer_map_get_reader() to fix NOBLOCK callers. * kerneldoc for struct trace_meta_page. * Add a patch to zero all the ring-buffer allocations. v9 -> v10: * Refactor rb_update_meta_page() * In-loop declaration for foreach_subbuf_page() * Check for cpu_buffer->mapped overflow v8 -> v9: * Fix the unlock path in ring_buffer_map() * Fix cpu_buffer cast with rb_work_rq->is_cpu_buffer * Rebase on linux-trace/for-next (3cb3091138ca0921c4569bcf7ffa062519639b6a) v7 -> v8: * Drop the subbufs renaming into bpages * Use subbuf as a name when relevant v6 -> v7: * Rebase onto lore.kernel.org/lkml/20231215175502.106587...@goodmis.org/ * Support for subbufs * Rename subbufs into bpages v5 -> v6: * Rebase on next-20230802. * (unsigned long) -> (void *) cast for virt_to_page(). * Add a wait for the GET_READER_PAGE ioctl. * Move writer fields update (overrun/pages_lost/entries/pages_touched) in the irq_work. * Rearrange id in struct buffer_page. * Rearrange the meta-page. * ring_buffer_meta_page -> trace_buffer_meta_page. * Add meta_struct_len into the meta-page. v4 -> v5: * Trivial rebase onto 6.5-rc3 (previously 6.4-rc3) v3 -> v4: * Add to the meta-page: - pages_lost / pages_read (allow to compute how full is the ring-buffer) - read (allow to compute how many entries can be read) - A reader_page struct. * Rename ring_buffer_meta_header -> ring_buffer_meta * Rename ring_buffer_get_reader_page -> ring_buffer_map_get_reader_page * Properly consume events on ring_buffer_map_get_reader_page() with rb_advance_reader(). v2 -> v3: * Remove data page list (for non-consuming read) ** Implies removing order > 0 meta-page * Add a new meta page field ->read * Rename ring_buffer_meta_page_header into ring_buffer_meta_header v1 -> v2: * Hide data_pages from the userspace struct * Fix META_PAGE_MAX_PAGES * Support for order > 0 meta-page * Add missing page->mapping. Vincent Donnefort (6): ring-buffer: Zero ring-buffer sub-buffers ring-buffer: Introducing ring-buffer mapping functions tracing: Add snapshot refcount tracing: Allow user-space mapping of the ring-buffer Documentation: tracing: Add ring-buffer mapping ring-buffer/selftest: Add ring-buffer mapping test Documentation/trace/index.rst | 1 + Documentation/trace/ring-buffer-map.rst | 106 + include/linux/ring_buffer.h | 7 + include/uapi/linux/trace_mmap.h | 48 +++ kernel/trace/ring_buffer.c| 385 +- kernel/trace/trace.c | 234 ++- kernel/trace/trace.h
Re: [PATCH] ALSA: virtio: Fix "Coverity: virtsnd_kctl_tlv_op(): Uninitialized variables" warning.
On Tue, 13 Feb 2024 10:07:35 +0100, Michael S. Tsirkin wrote: > > On Tue, Feb 13, 2024 at 10:02:24AM +0100, Takashi Iwai wrote: > > On Tue, 13 Feb 2024 09:51:30 +0100, > > Aiswarya Cyriac wrote: > > > > > > Fix the following warning when building virtio_snd driver. > > > > > > " > > > *** CID 1583619: Uninitialized variables (UNINIT) > > > sound/virtio/virtio_kctl.c:294 in virtsnd_kctl_tlv_op() > > > 288 > > > 289 break; > > > 290 } > > > 291 > > > 292 kfree(tlv); > > > 293 > > > vvv CID 1583619: Uninitialized variables (UNINIT) > > > vvv Using uninitialized value "rc". > > > 294 return rc; > > > 295 } > > > 296 > > > 297 /** > > > 298 * virtsnd_kctl_get_enum_items() - Query items for the ENUMERATED > > > element type. > > > 299 * @snd: VirtIO sound device. > > > " > > > > > > Signed-off-by: Anton Yakovlev > > > Signed-off-by: Aiswarya Cyriac > > > Reported-by: coverity-bot > > > Addresses-Coverity-ID: 1583619 ("Uninitialized variables") > > > Fixes: d6568e3de42d ("ALSA: virtio: add support for audio controls") > > > > Thanks, applied. > > > > > > Takashi > > Why did you apply it directly? The patch isn't great IMHO. > Why not give people a couple of days to review? Sure, we can wait. I applied it quickly just since it's an obvious fix and you haven't responded to the original patch over a month. thanks, Takashi
Re: [PATCH] ALSA: virtio: Fix "Coverity: virtsnd_kctl_tlv_op(): Uninitialized variables" warning.
On Tue, Feb 13, 2024 at 10:02:24AM +0100, Takashi Iwai wrote: > On Tue, 13 Feb 2024 09:51:30 +0100, > Aiswarya Cyriac wrote: > > > > Fix the following warning when building virtio_snd driver. > > > > " > > *** CID 1583619: Uninitialized variables (UNINIT) > > sound/virtio/virtio_kctl.c:294 in virtsnd_kctl_tlv_op() > > 288 > > 289 break; > > 290 } > > 291 > > 292 kfree(tlv); > > 293 > > vvv CID 1583619: Uninitialized variables (UNINIT) > > vvv Using uninitialized value "rc". > > 294 return rc; > > 295 } > > 296 > > 297 /** > > 298 * virtsnd_kctl_get_enum_items() - Query items for the ENUMERATED > > element type. > > 299 * @snd: VirtIO sound device. > > " > > > > Signed-off-by: Anton Yakovlev > > Signed-off-by: Aiswarya Cyriac > > Reported-by: coverity-bot > > Addresses-Coverity-ID: 1583619 ("Uninitialized variables") > > Fixes: d6568e3de42d ("ALSA: virtio: add support for audio controls") > > Thanks, applied. > > > Takashi Why did you apply it directly? The patch isn't great IMHO. Why not give people a couple of days to review? -- MST
Re: [PATCH] ALSA: virtio: Fix "Coverity: virtsnd_kctl_tlv_op(): Uninitialized variables" warning.
On Tue, Feb 13, 2024 at 09:51:30AM +0100, Aiswarya Cyriac wrote: > Fix the following warning when building virtio_snd driver. > > " > *** CID 1583619: Uninitialized variables (UNINIT) > sound/virtio/virtio_kctl.c:294 in virtsnd_kctl_tlv_op() > 288 > 289 break; > 290 } > 291 > 292 kfree(tlv); > 293 > vvv CID 1583619: Uninitialized variables (UNINIT) > vvv Using uninitialized value "rc". > 294 return rc; > 295 } > 296 > 297 /** > 298 * virtsnd_kctl_get_enum_items() - Query items for the ENUMERATED > element type. > 299 * @snd: VirtIO sound device. > " > > Signed-off-by: Anton Yakovlev > Signed-off-by: Aiswarya Cyriac > Reported-by: coverity-bot > Addresses-Coverity-ID: 1583619 ("Uninitialized variables") > Fixes: d6568e3de42d ("ALSA: virtio: add support for audio controls") I don't know enough about ALSA to say whether the patch is correct. But the commit log needs work: please, do not "fix warnings" - analyse the code and explain whether there is a real issue and if yes what is it and how it can trigger. Is an invalid op_flag ever passed? If it's just a coverity false positive it might be ok to work around that but document this. > --- > sound/virtio/virtio_kctl.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/sound/virtio/virtio_kctl.c b/sound/virtio/virtio_kctl.c > index 0c6ac74aca1e..d7a160c5db03 100644 > --- a/sound/virtio/virtio_kctl.c > +++ b/sound/virtio/virtio_kctl.c > @@ -286,6 +286,11 @@ static int virtsnd_kctl_tlv_op(struct snd_kcontrol > *kcontrol, int op_flag, > else > rc = virtsnd_ctl_msg_send(snd, msg, &sg, NULL, false); > > + break; > + default: > + virtsnd_ctl_msg_unref(msg); > + rc = -EINVAL; > + There's already virtsnd_ctl_msg_unref call above. Also don't we need virtsnd_ctl_msg_unref on other error paths such as EFAULT? Unify error handling to fix it all then? > break; > } > > -- > 2.43.0
Re: [PATCH] ALSA: virtio: Fix "Coverity: virtsnd_kctl_tlv_op(): Uninitialized variables" warning.
On Tue, 13 Feb 2024 09:51:30 +0100, Aiswarya Cyriac wrote: > > Fix the following warning when building virtio_snd driver. > > " > *** CID 1583619: Uninitialized variables (UNINIT) > sound/virtio/virtio_kctl.c:294 in virtsnd_kctl_tlv_op() > 288 > 289 break; > 290 } > 291 > 292 kfree(tlv); > 293 > vvv CID 1583619: Uninitialized variables (UNINIT) > vvv Using uninitialized value "rc". > 294 return rc; > 295 } > 296 > 297 /** > 298 * virtsnd_kctl_get_enum_items() - Query items for the ENUMERATED > element type. > 299 * @snd: VirtIO sound device. > " > > Signed-off-by: Anton Yakovlev > Signed-off-by: Aiswarya Cyriac > Reported-by: coverity-bot > Addresses-Coverity-ID: 1583619 ("Uninitialized variables") > Fixes: d6568e3de42d ("ALSA: virtio: add support for audio controls") Thanks, applied. Takashi
[PATCH] ALSA: virtio: Fix "Coverity: virtsnd_kctl_tlv_op(): Uninitialized variables" warning.
Fix the following warning when building virtio_snd driver. " *** CID 1583619: Uninitialized variables (UNINIT) sound/virtio/virtio_kctl.c:294 in virtsnd_kctl_tlv_op() 288 289 break; 290 } 291 292 kfree(tlv); 293 vvv CID 1583619: Uninitialized variables (UNINIT) vvv Using uninitialized value "rc". 294 return rc; 295 } 296 297 /** 298 * virtsnd_kctl_get_enum_items() - Query items for the ENUMERATED element type. 299 * @snd: VirtIO sound device. " Signed-off-by: Anton Yakovlev Signed-off-by: Aiswarya Cyriac Reported-by: coverity-bot Addresses-Coverity-ID: 1583619 ("Uninitialized variables") Fixes: d6568e3de42d ("ALSA: virtio: add support for audio controls") --- sound/virtio/virtio_kctl.c | 5 + 1 file changed, 5 insertions(+) diff --git a/sound/virtio/virtio_kctl.c b/sound/virtio/virtio_kctl.c index 0c6ac74aca1e..d7a160c5db03 100644 --- a/sound/virtio/virtio_kctl.c +++ b/sound/virtio/virtio_kctl.c @@ -286,6 +286,11 @@ static int virtsnd_kctl_tlv_op(struct snd_kcontrol *kcontrol, int op_flag, else rc = virtsnd_ctl_msg_send(snd, msg, &sg, NULL, false); + break; + default: + virtsnd_ctl_msg_unref(msg); + rc = -EINVAL; + break; } -- 2.43.0
Re: [PATCH V1] vdpa: suspend and resume require DRIVER_OK
On Tue, Feb 13, 2024 at 8:49 AM Eugenio Perez Martin wrote: > > On Mon, Feb 12, 2024 at 9:20 AM Michael S. Tsirkin wrote: > > > > On Fri, Feb 09, 2024 at 02:29:59PM -0800, Steve Sistare wrote: > > > Calling suspend or resume requires VIRTIO_CONFIG_S_DRIVER_OK, for all > > > vdpa devices. > > > > > > Suggested-by: Eugenio Perez Martin " > > > Signed-off-by: Steve Sistare > > > > I don't think failing suspend or resume makes sense though - > > e.g. practically failing suspend will just prevent sleeping I think - > > why should guest not having driver loaded prevent > > system suspend? > > > > In the QEMU case the vhost device has not started, so QEMU should > allow the system suspension. > > I haven't tested the QEMU behavior on suspend (not poweroff) with the > guest driver loaded, but I think QEMU should indeed block the > suspension, as there is no way to recover the device after that > without the guest cooperation? > > > there's also state such as features set which does need to be > > preserved. > > > > That's true if the device does not support resuming. Well, in the > particular case of features, maybe we need to keep it, as userspace > could call VHOST_GET_FEATURES. But maybe we can clean some things, > right. > > > I think the thing to do is to skip invoking suspend/resume callback, and in > > fact checking suspend/resume altogether. > > > > I don't follow this. What should be done in this cases by QEMU? > 1) The device does not support suspend > 2) The device support suspend but not resume > > In my opinion 1) should be forbidden, as we don't support to resume > the device properly, and 2) can be allowed by fetching all the state. > Ok I missed the whole other thread, everything is clear now. Thanks! > Thanks! > > > > --- > > > drivers/vhost/vdpa.c | 6 ++ > > > 1 file changed, 6 insertions(+) > > > > > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > > > index bc4a51e4638b..ce1882acfc3b 100644 > > > --- a/drivers/vhost/vdpa.c > > > +++ b/drivers/vhost/vdpa.c > > > @@ -598,6 +598,9 @@ static long vhost_vdpa_suspend(struct vhost_vdpa *v) > > > if (!ops->suspend) > > > return -EOPNOTSUPP; > > > > > > + if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK)) > > > + return -EINVAL; > > > + > > > ret = ops->suspend(vdpa); > > > if (!ret) > > > v->suspended = true; > > > @@ -618,6 +621,9 @@ static long vhost_vdpa_resume(struct vhost_vdpa *v) > > > if (!ops->resume) > > > return -EOPNOTSUPP; > > > > > > + if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK)) > > > + return -EINVAL; > > > + > > > ret = ops->resume(vdpa); > > > if (!ret) > > > v->suspended = false; > > > -- > > > 2.39.3 > > > >