Re: [PATCH] tracing: Have trace_event_file have ref counters

2023-11-01 Thread Steven Rostedt
On Wed, 1 Nov 2023 22:32:54 -0400
Steven Rostedt  wrote:

> > Ouch! I thought the file descriptor has been hold by the opened process.  
> 
> Well, the struct *filp is, but not the filp->private that points to the
> struct trace_event_file *file.

That was supposed to be "struct file *filp"

-- Steve



Re: [PATCH] tracing: Have trace_event_file have ref counters

2023-11-01 Thread Steven Rostedt
On Thu, 2 Nov 2023 11:14:33 +0900
Masami Hiramatsu (Google)  wrote:
> > 
> > What happens here is that the kprobe event creates a trace_event_file
> > "file" descriptor that represents the file in tracefs to the event. It
> > maintains state of the event (is it enabled for the given instance?).
> > Opening the "enable" file gets a reference to the event "file" descriptor
> > via the open file descriptor. When the kprobe event is deleted, the file is
> > also deleted from the tracefs system which also frees the event "file"
> > descriptor.  
> 
> Ouch! I thought the file descriptor has been hold by the opened process.

Well, the struct *filp is, but not the filp->private that points to the
struct trace_event_file *file.

> 
> > 
> > But as the tracefs file is still opened by user space, it will not be
> > totally removed until the final dput() is called on it. But this is not
> > true with the event "file" descriptor that is already freed. If the user
> > does a write to or simply closes the file descriptor it will reference the
> > event "file" descriptor that was just freed, causing a use-after-free bug.
> > 
> > To solve this, add a ref count to the event "file" descriptor as well as a
> > new flag called "FREED". The "file" will not be freed until the last
> > reference is released. But the FREE flag will be set when the event is
> > removed to prevent any more modifications to that event from happening,
> > even if there's still a reference to the event "file" descriptor.
> > 
> > Link: 
> > https://lore.kernel.org/linux-trace-kernel/2023103131.1e705...@gandalf.local.home/
> >   
> 
> Looks good to me.
> 
> Reviewed-by: Masami Hiramatsu (Google) 
> 
> BTW, can we add some tracefs selftests to ftracetest or independent test?

Yes, I was thinking about this, but that will have to wait till after this
gets in mainline. I'm already way behind schedule.

Thanks,

-- Steve



Re: [PATCH] tracing: Have trace_event_file have ref counters

2023-11-01 Thread Google
On Tue, 31 Oct 2023 12:24:53 -0400
Steven Rostedt  wrote:

> From: "Steven Rostedt (Google)" 
> 
> The following can crash the kernel:
> 
>  # cd /sys/kernel/tracing
>  # echo 'p:sched schedule' > kprobe_events
>  # exec 5>>events/kprobes/sched/enable
>  # > kprobe_events
>  # exec 5>&-
> 
> The above commands:
> 
>  1. Change directory to the tracefs directory
>  2. Create a kprobe event (doesn't matter what one)
>  3. Open bash file descriptor 5 on the enable file of the kprobe event
>  4. Delete the kprobe event (removes the files too)
>  5. Close the bash file descriptor 5
> 
> The above causes a crash!
> 
>  BUG: kernel NULL pointer dereference, address: 0028
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x) - not-present page
>  PGD 0 P4D 0
>  Oops:  [#1] PREEMPT SMP PTI
>  CPU: 6 PID: 877 Comm: bash Not tainted 
> 6.5.0-rc4-test-8-g2c6b6b1029d4-dirty #186
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> 1.16.2-debian-1.16.2-1 04/01/2014
>  RIP: 0010:tracing_release_file_tr+0xc/0x50
> 
> What happens here is that the kprobe event creates a trace_event_file
> "file" descriptor that represents the file in tracefs to the event. It
> maintains state of the event (is it enabled for the given instance?).
> Opening the "enable" file gets a reference to the event "file" descriptor
> via the open file descriptor. When the kprobe event is deleted, the file is
> also deleted from the tracefs system which also frees the event "file"
> descriptor.

Ouch! I thought the file descriptor has been hold by the opened process.

> 
> But as the tracefs file is still opened by user space, it will not be
> totally removed until the final dput() is called on it. But this is not
> true with the event "file" descriptor that is already freed. If the user
> does a write to or simply closes the file descriptor it will reference the
> event "file" descriptor that was just freed, causing a use-after-free bug.
> 
> To solve this, add a ref count to the event "file" descriptor as well as a
> new flag called "FREED". The "file" will not be freed until the last
> reference is released. But the FREE flag will be set when the event is
> removed to prevent any more modifications to that event from happening,
> even if there's still a reference to the event "file" descriptor.
> 
> Link: 
> https://lore.kernel.org/linux-trace-kernel/2023103131.1e705...@gandalf.local.home/
> 

Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) 

BTW, can we add some tracefs selftests to ftracetest or independent test?

Thanks,

> Cc: sta...@vger.kernel.org
> Fixes: f5ca233e2e66d ("tracing: Increase trace array ref count on enable and 
> filter files")
> Reported-by: Beau Belgrave 
> Signed-off-by: Steven Rostedt (Google) 
> ---
>  include/linux/trace_events.h   |  4 
>  kernel/trace/trace.c   | 15 +++
>  kernel/trace/trace.h   |  3 +++
>  kernel/trace/trace_events.c| 31 ++
>  kernel/trace/trace_events_filter.c |  3 +++
>  5 files changed, 52 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 12207dc6722d..696f8dc4aa53 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -492,6 +492,7 @@ enum {
>   EVENT_FILE_FL_TRIGGER_COND_BIT,
>   EVENT_FILE_FL_PID_FILTER_BIT,
>   EVENT_FILE_FL_WAS_ENABLED_BIT,
> + EVENT_FILE_FL_FREED_BIT,
>  };
>  
>  extern struct trace_event_file *trace_get_event_file(const char *instance,
> @@ -630,6 +631,7 @@ extern int __kprobe_event_add_fields(struct dynevent_cmd 
> *cmd, ...);
>   *  TRIGGER_COND  - When set, one or more triggers has an associated filter
>   *  PID_FILTER- When set, the event is filtered based on pid
>   *  WAS_ENABLED   - Set when enabled to know to clear trace on module removal
> + *  FREED - File descriptor is freed, all fields should be 
> considered invalid
>   */
>  enum {
>   EVENT_FILE_FL_ENABLED   = (1 << EVENT_FILE_FL_ENABLED_BIT),
> @@ -643,6 +645,7 @@ enum {
>   EVENT_FILE_FL_TRIGGER_COND  = (1 << EVENT_FILE_FL_TRIGGER_COND_BIT),
>   EVENT_FILE_FL_PID_FILTER= (1 << EVENT_FILE_FL_PID_FILTER_BIT),
>   EVENT_FILE_FL_WAS_ENABLED   = (1 << EVENT_FILE_FL_WAS_ENABLED_BIT),
> + EVENT_FILE_FL_FREED = (1 << EVENT_FILE_FL_FREED_BIT),
>  };
>  
>  struct trace_event_file {
> @@ -671,6 +674,7 @@ struct trace_event_file {
>* caching and such. Which is mostly OK ;-)
>*/
>   unsigned long   flags;
> + atomic_tref;/* ref count for opened files */
>   atomic_tsm_ref; /* soft-mode reference counter */
>   atomic_ttm_ref; /* trigger-mode reference counter */
>  };
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 2539cfc20a97..9aebf904ff97 100644
> --- a/kernel/trace/trace.c
> +++ 

Re: [PATCH v6 7/8] eventfs: Remove special processing of dput() of events directory

2023-11-01 Thread Google
On Wed, 01 Nov 2023 13:25:48 -0400
Steven Rostedt  wrote:

> From: "Steven Rostedt (Google)" 
> 
> The top level events directory is no longer special with regards to how it
> should be delete. Remove the extra processing for it in
> eventfs_set_ei_status_free().
> 
Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) 

> Cc: Ajay Kaher 
> Signed-off-by: Steven Rostedt (Google) 
> ---
> Changes since v5: 
> https://lkml.kernel.org/r/20231031223421.185413...@goodmis.org
> 
> - Resynced to this patch series
> 
>  fs/tracefs/event_inode.c | 19 ++-
>  1 file changed, 2 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 0a04ae0ca8c8..0087a3f455f1 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -274,28 +274,11 @@ static void free_ei(struct eventfs_inode *ei)
>   */
>  void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry 
> *dentry)
>  {
> - struct tracefs_inode *ti_parent;
>   struct eventfs_inode *ei;
>   int i;
>  
> - /* The top level events directory may be freed by this */
> - if (unlikely(ti->flags & TRACEFS_EVENT_TOP_INODE)) {
> - mutex_lock(_mutex);
> - ei = ti->private;
> - /* Nothing should access this, but just in case! */
> - ti->private = NULL;
> - mutex_unlock(_mutex);
> -
> - free_ei(ei);
> - return;
> - }
> -
>   mutex_lock(_mutex);
>  
> - ti_parent = get_tracefs(dentry->d_parent->d_inode);
> - if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE))
> - goto out;
> -
>   ei = dentry->d_fsdata;
>   if (!ei)
>   goto out;
> @@ -920,6 +903,8 @@ struct eventfs_inode *eventfs_create_events_dir(const 
> char *name, struct dentry
>   inode->i_op = _root_dir_inode_operations;
>   inode->i_fop = _file_operations;
>  
> + dentry->d_fsdata = ei;
> +
>   /* directory inodes start off with i_nlink == 2 (for "." entry) */
>   inc_nlink(inode);
>   d_instantiate(dentry, inode);
> -- 
> 2.42.0


-- 
Masami Hiramatsu (Google) 



Re: [PATCH v8 2/3] mm/memory_hotplug: split memmap_on_memory requests across memblocks

2023-11-01 Thread Verma, Vishal L
On Thu, 2023-11-02 at 09:16 +0800, Huang, Ying wrote:
> Vishal Verma  writes:
> 
[..]
> > +
> > +static int create_altmaps_and_memory_blocks(int nid, struct memory_group 
> > *group,
> > +   u64 start, u64 size)
> > +{
> > +   unsigned long memblock_size = memory_block_size_bytes();
> > +   u64 cur_start;
> > +   int ret;
> > +
> > +   for (cur_start = start; cur_start < start + size;
> > +    cur_start += memblock_size) {
> > +   struct mhp_params params = { .pgprot =
> > +    
> > pgprot_mhp(PAGE_KERNEL) };
> > +   struct vmem_altmap mhp_altmap = {
> > +   .base_pfn = PHYS_PFN(cur_start),
> > +   .end_pfn = PHYS_PFN(cur_start + memblock_size - 1),
> > +   };
> > +
> > +   mhp_altmap.free = memory_block_memmap_on_memory_pages();
> > +   params.altmap = kmemdup(_altmap, sizeof(struct 
> > vmem_altmap),
> > +   GFP_KERNEL);
> > +   if (!params.altmap)
> > +   return -ENOMEM;
> 
> Use "goto out" here too?

Hm, yes I suppose we want to clean up previous iterations of the loop -
I'll make this change.

> 
> > +
> > +   /* call arch's memory hotadd */
> > +   ret = arch_add_memory(nid, cur_start, memblock_size, 
> > );
> > +   if (ret < 0) {
> > +   kfree(params.altmap);
> > +   goto out;
> > +   }
> > +
> > +   /* create memory block devices after memory was added */
> > +   ret = create_memory_block_devices(cur_start, memblock_size,
> > + params.altmap, group);
> > +   if (ret) {
> > +   arch_remove_memory(cur_start, memblock_size, NULL);
> > +   kfree(params.altmap);
> 
> How about move arch_remove_memory() and kree() to error path and use
> different label?

I thought of this, but it got slightly awkward because of the scope of
'params' (declared/allocated within the loop), just kfree'ing in that
scope looked cleaner..



Re: [PATCH v8 2/3] mm/memory_hotplug: split memmap_on_memory requests across memblocks

2023-11-01 Thread Huang, Ying
Vishal Verma  writes:

> The MHP_MEMMAP_ON_MEMORY flag for hotplugged memory is restricted to
> 'memblock_size' chunks of memory being added. Adding a larger span of
> memory precludes memmap_on_memory semantics.
>
> For users of hotplug such as kmem, large amounts of memory might get
> added from the CXL subsystem. In some cases, this amount may exceed the
> available 'main memory' to store the memmap for the memory being added.
> In this case, it is useful to have a way to place the memmap on the
> memory being added, even if it means splitting the addition into
> memblock-sized chunks.
>
> Change add_memory_resource() to loop over memblock-sized chunks of
> memory if caller requested memmap_on_memory, and if other conditions for
> it are met. Teach try_remove_memory() to also expect that a memory
> range being removed might have been split up into memblock sized chunks,
> and to loop through those as needed.
>
> This does preclude being able to use PUD mappings in the direct map; a
> proposal to how this could be optimized in the future is laid out
> here[1].
>
> [1]: 
> https://lore.kernel.org/linux-mm/b6753402-2de9-25b2-36e9-eacd49752...@redhat.com/
>
> Cc: Andrew Morton 
> Cc: David Hildenbrand 
> Cc: Michal Hocko 
> Cc: Oscar Salvador 
> Cc: Dan Williams 
> Cc: Dave Jiang 
> Cc: Dave Hansen 
> Cc: Huang Ying 
> Suggested-by: David Hildenbrand 
> Reviewed-by: Dan Williams 
> Signed-off-by: Vishal Verma 
> ---
>  mm/memory_hotplug.c | 213 
> ++--
>  1 file changed, 138 insertions(+), 75 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 6be7de9efa55..d242e49d7f7b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1380,6 +1380,84 @@ static bool mhp_supports_memmap_on_memory(unsigned 
> long size)
>   return arch_supports_memmap_on_memory(vmemmap_size);
>  }
>  
> +static void __ref remove_memory_blocks_and_altmaps(u64 start, u64 size)
> +{
> + unsigned long memblock_size = memory_block_size_bytes();
> + u64 cur_start;
> +
> + /*
> +  * For memmap_on_memory, the altmaps were added on a per-memblock
> +  * basis; we have to process each individual memory block.
> +  */
> + for (cur_start = start; cur_start < start + size;
> +  cur_start += memblock_size) {
> + struct vmem_altmap *altmap = NULL;
> + struct memory_block *mem;
> +
> + mem = find_memory_block(pfn_to_section_nr(PFN_DOWN(cur_start)));
> + WARN_ON_ONCE(!mem);
> + if (!mem)
> + continue;
> +
> + altmap = mem->altmap;
> + mem->altmap = NULL;
> +
> + remove_memory_block_devices(cur_start, memblock_size);
> +
> + arch_remove_memory(cur_start, memblock_size, altmap);
> +
> + /* Verify that all vmemmap pages have actually been freed. */
> + WARN(altmap->alloc, "Altmap not fully unmapped");
> + kfree(altmap);
> + }
> +}
> +
> +static int create_altmaps_and_memory_blocks(int nid, struct memory_group 
> *group,
> + u64 start, u64 size)
> +{
> + unsigned long memblock_size = memory_block_size_bytes();
> + u64 cur_start;
> + int ret;
> +
> + for (cur_start = start; cur_start < start + size;
> +  cur_start += memblock_size) {
> + struct mhp_params params = { .pgprot =
> +  pgprot_mhp(PAGE_KERNEL) };
> + struct vmem_altmap mhp_altmap = {
> + .base_pfn = PHYS_PFN(cur_start),
> + .end_pfn = PHYS_PFN(cur_start + memblock_size - 1),
> + };
> +
> + mhp_altmap.free = memory_block_memmap_on_memory_pages();
> + params.altmap = kmemdup(_altmap, sizeof(struct vmem_altmap),
> + GFP_KERNEL);
> + if (!params.altmap)
> + return -ENOMEM;

Use "goto out" here too?

> +
> + /* call arch's memory hotadd */
> + ret = arch_add_memory(nid, cur_start, memblock_size, );
> + if (ret < 0) {
> + kfree(params.altmap);
> + goto out;
> + }
> +
> + /* create memory block devices after memory was added */
> + ret = create_memory_block_devices(cur_start, memblock_size,
> +   params.altmap, group);
> + if (ret) {
> + arch_remove_memory(cur_start, memblock_size, NULL);
> + kfree(params.altmap);

How about move arch_remove_memory() and kree() to error path and use
different label?

--
Best Regards,
Huang, Ying

> + goto out;
> + }
> + }
> +
> + return 0;
> +out:
> + if (ret && (cur_start != start))
> + remove_memory_blocks_and_altmaps(start, cur_start - 

[GIT PULL] NVDIMM for 6.7

2023-11-01 Thread Ira Weiny
Hi Linus, please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git 
tags/libnvdimm-for-6.7

... to get updates to the nvdimm tree.  They are a mix of bug fixes and updates
to interfaces used by nvdimm.

Bug fixes include:
Fix a sleep during spinlock in PREEMPT_RT kernels when getting a BTT
lane
Check for kstrdup memory failure in region probe
Use the correct variables in the nvdimm_badblocks_populate() kdoc block

Updates to interfaces include:
Use devm_kstrdup to manage memory better
Allow class structures to be declared at build time
Start using __counted_by to help with bounds checking
Remove use of the deprecated strncpy

They have all been in -next for more than a week with no reported issues.

---

The following changes since commit 6465e260f48790807eef06b583b38ca9789b6072:

  Linux 6.6-rc3 (2023-09-24 14:31:13 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git 
tags/libnvdimm-for-6.7

for you to fetch changes up to 9ea459e477dc09370cdd8ee13b61aefe8cd1f20a:

  libnvdimm: remove kernel-doc warnings: (2023-10-18 09:48:05 -0700)


libnvdimm updates for v6.7

- updates to deprecated and changed interfaces
- bug/kdoc fixes


Chen Ni (1):
  libnvdimm/of_pmem: Use devm_kstrdup instead of kstrdup and check its 
return value

Greg Kroah-Hartman (1):
  testing: nvdimm: make struct class structures constant

Justin Stitt (1):
  dax: refactor deprecated strncpy

Kees Cook (1):
  libnvdimm: Annotate struct nd_region with __counted_by

Tomas Glozar (1):
  nd_btt: Make BTT lanes preemptible

Zhu Wang (1):
  libnvdimm: remove kernel-doc warnings:

 drivers/dax/bus.c  |  2 +-
 drivers/nvdimm/badrange.c  |  4 ++--
 drivers/nvdimm/nd.h|  2 +-
 drivers/nvdimm/of_pmem.c   |  8 +++-
 drivers/nvdimm/region_devs.c   | 10 +-
 tools/testing/nvdimm/test/ndtest.c | 17 +
 tools/testing/nvdimm/test/nfit.c   | 14 +++---
 7 files changed, 32 insertions(+), 25 deletions(-)



Re: [PATCH v6 5/8] eventfs: Hold eventfs_mutex when calling callback functions

2023-11-01 Thread Google
On Wed, 01 Nov 2023 13:25:46 -0400
Steven Rostedt  wrote:

> From: "Steven Rostedt (Google)" 
> 
> The callback function that is used to create inodes and dentries is not
> protected by anything and the data that is passed to it could become
> stale. After eventfs_remove_dir() is called by the tracing system, it is
> free to remove the events that are associated to that directory.
> Unfortunately, that means the callbacks must not be called after that.
> 
>  CPU0 CPU1
>   
>  eventfs_root_lookup() {
>eventfs_remove_dir() {
> mutex_lock(_mutex);
> ei->is_freed = set;
> mutex_unlock(_mutex);
>}
>kfree(event_call);
> 
> for (...) {
>   entry = >entries[i];
>   r = entry->callback() {
>   call = data;// call == event_call above
>   if (call->flags ...)
> 
>  [ USE AFTER FREE BUG ]
> 
> The safest way to protect this is to wrap the callback with:
> 
>  mutex_lock(_mutex);
>  if (!ei->is_freed)
>  r = entry->callback();
>  else
>  r = -1;
>  mutex_unlock(_mutex);
> 
> This will make sure that the callback will not be called after it is
> freed. But now it needs to be known that the callback is called while
> holding internal eventfs locks, and that it must not call back into the
> eventfs / tracefs system. There's no reason it should anyway, but document
> that as well.
> 
> Link: 
> https://lore.kernel.org/all/CA+G9fYu9GOEbD=rR5eMR-=HJ8H6rMsbzDC2ZY5=y50wpwae...@mail.gmail.com/
> 

Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) 

Thanks!

> Cc: Ajay Kaher 
> Reported-by: Linux Kernel Functional Testing 
> Reported-by: Naresh Kamboju 
> Tested-by: Linux Kernel Functional Testing 
> Tested-by: Naresh Kamboju 
> Signed-off-by: Steven Rostedt (Google) 
> ---
> Changes since v5: 
> https://lkml.kernel.org/r/20231031223420.778161...@goodmis.org
> 
> - Resynced to this patch series
> 
>  fs/tracefs/event_inode.c | 22 ++--
>  include/linux/tracefs.h  | 43 
>  2 files changed, 63 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 93d08e552483..8ac9abf7a3d5 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -615,7 +615,13 @@ static struct dentry *eventfs_root_lookup(struct inode 
> *dir,
>   entry = >entries[i];
>   if (strcmp(name, entry->name) == 0) {
>   void *cdata = data;
> - r = entry->callback(name, , , );
> + mutex_lock(_mutex);
> + /* If ei->is_freed, then the event itself may be too */
> + if (!ei->is_freed)
> + r = entry->callback(name, , , );
> + else
> + r = -1;
> + mutex_unlock(_mutex);
>   if (r <= 0)
>   continue;
>   ret = simple_lookup(dir, dentry, flags);
> @@ -749,7 +755,13 @@ static int dcache_dir_open_wrapper(struct inode *inode, 
> struct file *file)
>   void *cdata = data;
>   entry = >entries[i];
>   name = entry->name;
> - r = entry->callback(name, , , );
> + mutex_lock(_mutex);
> + /* If ei->is_freed, then the event itself may be too */
> + if (!ei->is_freed)
> + r = entry->callback(name, , , );
> + else
> + r = -1;
> + mutex_unlock(_mutex);
>   if (r <= 0)
>   continue;
>   d = create_file_dentry(ei, i, parent, name, mode, cdata, fops, 
> false);
> @@ -819,6 +831,10 @@ static int dcache_readdir_wrapper(struct file *file, 
> struct dir_context *ctx)
>   *   data = A pointer to @data, and the callback may replace it, which will
>   * cause the file created to pass the new data to the open() call.
>   *   fops = the fops to use for the created file.
> + *
> + * NB. @callback is called while holding internal locks of the eventfs
> + * system. The callback must not call any code that might also call into
> + * the tracefs or eventfs system or it will risk creating a deadlock.
>   */
>  struct eventfs_inode *eventfs_create_dir(const char *name, struct 
> eventfs_inode *parent,
>const struct eventfs_entry *entries,
> @@ -878,6 +894,8 @@ struct eventfs_inode *eventfs_create_dir(const char 
> *name, struct eventfs_inode
>   * @data: The default data to pass to the files (an entry may override it).
>   *
>   * This function creates the top of the trace event directory.
> + *
> + * See eventfs_create_dir() 

Re: [PATCH v5 4/7] eventfs: Save ownership and mode

2023-11-01 Thread Steven Rostedt
On Thu, 2 Nov 2023 08:43:32 +0900
Masami Hiramatsu (Google)  wrote:

> On Tue, 31 Oct 2023 18:33:30 -0400
> Steven Rostedt  wrote:
> 
> > From: "Steven Rostedt (Google)" 
> > 
> > Now that inodes and dentries are created on the fly, they are also
> > reclaimed on memory pressure. Since the ownership and file mode are saved
> > in the inode, if they are freed, any changes to the ownership and mode
> > will be lost.  
> 
> Do we (need to) allow to change the ownership and mode of the eventfs files?
> I thought it was fixed on the files in tracefs...

Yes, it's the only way to allow non root users access to the tracing 
directories.

> 
> Otherwise, the code itself looks good to me.
> 
> Reviewed-by: Masami Hiramatsu (Google) 
> 

Thanks!

-- Steve



Re: [PATCH v5 4/7] eventfs: Save ownership and mode

2023-11-01 Thread Google
On Tue, 31 Oct 2023 18:33:30 -0400
Steven Rostedt  wrote:

> From: "Steven Rostedt (Google)" 
> 
> Now that inodes and dentries are created on the fly, they are also
> reclaimed on memory pressure. Since the ownership and file mode are saved
> in the inode, if they are freed, any changes to the ownership and mode
> will be lost.

Do we (need to) allow to change the ownership and mode of the eventfs files?
I thought it was fixed on the files in tracefs...

Otherwise, the code itself looks good to me.

Reviewed-by: Masami Hiramatsu (Google) 

Thanks,

> 
> To counter this, if the user changes the permissions or ownership, save
> them, and when creating the inodes again, restore those changes.
> 
> Signed-off-by: Steven Rostedt (Google) 
> ---
> Changes since v4: 
> https://lkml.kernel.org/r/20231031193428.558586...@goodmis.org
> 
> - Rebased to this series
> 
>  fs/tracefs/event_inode.c | 148 +++
>  fs/tracefs/internal.h|  16 +
>  2 files changed, 151 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index b28f240bbb6d..d1683bf6d316 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -40,6 +40,15 @@ static DEFINE_MUTEX(eventfs_mutex);
>   */
>  DEFINE_STATIC_SRCU(eventfs_srcu);
>  
> +/* Mode is unsigned short, use the upper bits for flags */
> +enum {
> + EVENTFS_SAVE_MODE   = BIT(16),
> + EVENTFS_SAVE_UID= BIT(17),
> + EVENTFS_SAVE_GID= BIT(18),
> +};
> +
> +#define EVENTFS_MODE_MASK(EVENTFS_SAVE_MODE - 1)
> +
>  static struct dentry *eventfs_root_lookup(struct inode *dir,
> struct dentry *dentry,
> unsigned int flags);
> @@ -47,8 +56,89 @@ static int dcache_dir_open_wrapper(struct inode *inode, 
> struct file *file);
>  static int dcache_readdir_wrapper(struct file *file, struct dir_context 
> *ctx);
>  static int eventfs_release(struct inode *inode, struct file *file);
>  
> +static void update_attr(struct eventfs_attr *attr, struct iattr *iattr)
> +{
> + unsigned int ia_valid = iattr->ia_valid;
> +
> + if (ia_valid & ATTR_MODE) {
> + attr->mode = (attr->mode & ~EVENTFS_MODE_MASK) |
> + (iattr->ia_mode & EVENTFS_MODE_MASK) |
> + EVENTFS_SAVE_MODE;
> + }
> + if (ia_valid & ATTR_UID) {
> + attr->mode |= EVENTFS_SAVE_UID;
> + attr->uid = iattr->ia_uid;
> + }
> + if (ia_valid & ATTR_GID) {
> + attr->mode |= EVENTFS_SAVE_GID;
> + attr->gid = iattr->ia_gid;
> + }
> +}
> +
> +static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry,
> + struct iattr *iattr)
> +{
> + const struct eventfs_entry *entry;
> + struct eventfs_inode *ei;
> + const char *name;
> + int ret;
> +
> + mutex_lock(_mutex);
> + ei = dentry->d_fsdata;
> + /* The LSB is set when the eventfs_inode is being freed */
> + if (((unsigned long)ei & 1UL) || ei->is_freed) {
> + /* Do not allow changes if the event is about to be removed. */
> + mutex_unlock(_mutex);
> + return -ENODEV;
> + }
> +
> + /* Preallocate the children mode array if necessary */
> + if (!(dentry->d_inode->i_mode & S_IFDIR)) {
> + if (!ei->entry_attrs) {
> + ei->entry_attrs = kzalloc(sizeof(*ei->entry_attrs) * 
> ei->nr_entries,
> +   GFP_KERNEL);
> + if (!ei->entry_attrs) {
> + ret = -ENOMEM;
> + goto out;
> + }
> + }
> + }
> +
> + ret = simple_setattr(idmap, dentry, iattr);
> + if (ret < 0)
> + goto out;
> +
> + /*
> +  * If this is a dir, then update the ei cache, only the file
> +  * mode is saved in the ei->m_children, and the ownership is
> +  * determined by the parent directory.
> +  */
> + if (dentry->d_inode->i_mode & S_IFDIR) {
> + update_attr(>attr, iattr);
> +
> + } else {
> + name = dentry->d_name.name;
> +
> + for (int i = 0; i < ei->nr_entries; i++) {
> + entry = >entries[i];
> + if (strcmp(name, entry->name) == 0) {
> + update_attr(>entry_attrs[i], iattr);
> + break;
> + }
> + }
> + }
> + out:
> + mutex_unlock(_mutex);
> + return ret;
> +}
> +
>  static const struct inode_operations eventfs_root_dir_inode_operations = {
>   .lookup = eventfs_root_lookup,
> + .setattr= eventfs_set_attr,
> +};
> +
> +static const struct inode_operations eventfs_file_inode_operations = {
> + .setattr= eventfs_set_attr,
>  };
>  
>  static const struct 

[PATCH v8 0/3] mm: use memmap_on_memory semantics for dax/kmem

2023-11-01 Thread Vishal Verma
The dax/kmem driver can potentially hot-add large amounts of memory
originating from CXL memory expanders, or NVDIMMs, or other 'device
memories'. There is a chance there isn't enough regular system memory
available to fit the memmap for this new memory. It's therefore
desirable, if all other conditions are met, for the kmem managed memory
to place its memmap on the newly added memory itself.

The main hurdle for accomplishing this for kmem is that memmap_on_memory
can only be done if the memory being added is equal to the size of one
memblock. To overcome this, allow the hotplug code to split an add_memory()
request into memblock-sized chunks, and try_remove_memory() to also
expect and handle such a scenario.

Patch 1 replaces an open-coded kmemdup()

Patch 2 teaches the memory_hotplug code to allow for splitting
add_memory() and remove_memory() requests over memblock sized chunks.

Patch 3 allows the dax region drivers to request memmap_on_memory
semantics. CXL dax regions default this to 'on', all others default to
off to keep existing behavior unchanged.

Signed-off-by: Vishal Verma 
---
Changes in v8:
- Fix unwinding in create_altmaps_and_memory_blocks() to remove
  partially added blocks and altmaps. (David, Ying)
- Simplify remove_memory_blocks_and_altmaps() since an altmap is
  assured. (David)
- Since we remove per memory-block altmaps, the walk through  memory
  blocks for a larger range isn't needed. Instead we can lookup the
  memory block directly from the pfn, allowing the test_has_altmap_cb()
  callback to be dropped and removed. (David)
- Link to v7: 
https://lore.kernel.org/r/20231025-vv-kmem_memmap-v7-0-4a76d7652...@intel.com

Changes in v7:
- Make the add_memory_resource() flow symmetrical w.r.t. try_remove_memory()
  in terms of how the altmap path is taken (David Hildenbrand)
- Move a comment, clean up usage of 'memblock' vs. 'memory_block'
  (David Hildenbrand)
- Don't use the altmap path for the 
mhp_supports_memmap_on_memory(memblock_size) == false
  case (Huang Ying)
- Link to v6: 
https://lore.kernel.org/r/20231016-vv-kmem_memmap-v6-0-078f0d3c0...@intel.com

Changes in v6:
- Add a prep patch to replace an open coded kmemdup in
  add_memory_resource() (Dan Williams)
- Fix ordering of firmware_map_remove w.r.t taking the hotplug lock
  (David Hildenbrand)
- Remove unused 'nid' variable, and a stray whitespace (David Hildenbrand)
- Clean up and simplify the altmap vs non-altmap paths for
  try_remove_memory (David Hildenbrand)
- Add a note to the changelog in patch 1 linking to the PUD mappings
  proposal (David Hildenbrand)
- Remove the new sysfs ABI from the kmem/dax drivers until ABI
  documentation for /sys/bus/dax can be established (will split this out
  into a separate patchset) (Dan Williams)
- Link to v5: 
https://lore.kernel.org/r/20231005-vv-kmem_memmap-v5-0-a54d1981f...@intel.com

Changes in v5:
- Separate out per-memblock operations from per memory block operations
  in try_remove_memory(), and rename the inner function appropriately.
  This does expand the scope of the memory hotplug lock to include
  remove_memory_block_devices(), but the alternative was to drop the
  lock in the inner function separately for each iteration, and then
  re-acquire it in try_remove_memory() creating a small window where
  the lock isn't held. (David Hildenbrand)
- Remove unnecessary rc check from the memmap_on_memory_store sysfs
  helper in patch 2 (Dan Carpenter)
- Link to v4: 
https://lore.kernel.org/r/20230928-vv-kmem_memmap-v4-0-6ff73fec5...@intel.com

Changes in v4:
- Rebase to Aneesh's PPC64 memmap_on_memory series v8 [2].
- Tweak a goto / error path in add_memory_create_devices() (Jonathan)
- Retain the old behavior for dax devices, only default to
  memmap_on_memory for CXL (Jonathan)
- Link to v3: 
https://lore.kernel.org/r/20230801-vv-kmem_memmap-v3-0-406e9aaf5...@intel.com

[2]: 
https://lore.kernel.org/linux-mm/20230808091501.287660-1-aneesh.ku...@linux.ibm.com

Changes in v3:
- Rebase on Aneesh's patches [1]
- Drop Patch 1 - it is not needed since [1] allows for dynamic setting
  of the memmap_on_memory param (David)
- Link to v2: 
https://lore.kernel.org/r/20230720-vv-kmem_memmap-v2-0-88bdaab34...@intel.com

[1]: https://lore.kernel.org/r/20230801044116.10674-1-aneesh.ku...@linux.ibm.com

Changes in v2:
- Drop the patch to create an override path for the memmap_on_memory
  module param (David)
- Move the chunking into memory_hotplug.c so that any caller of
  add_memory() can request this behavior. (David)
- Handle remove_memory() too. (David, Ying)
- Add a sysfs control in the kmem driver for memmap_on_memory semantics
  (David, Jonathan)
- Add a #else case to define mhp_supports_memmap_on_memory() if
  CONFIG_MEMORY_HOTPLUG is unset. (0day report)
- Link to v1: 
https://lore.kernel.org/r/20230613-vv-kmem_memmap-v1-0-f6de9c6af...@intel.com

---
Vishal Verma (3):
  mm/memory_hotplug: replace an open-coded kmemdup() in 
add_memory_resource()
  mm/memory_hotplug: 

[PATCH v8 2/3] mm/memory_hotplug: split memmap_on_memory requests across memblocks

2023-11-01 Thread Vishal Verma
The MHP_MEMMAP_ON_MEMORY flag for hotplugged memory is restricted to
'memblock_size' chunks of memory being added. Adding a larger span of
memory precludes memmap_on_memory semantics.

For users of hotplug such as kmem, large amounts of memory might get
added from the CXL subsystem. In some cases, this amount may exceed the
available 'main memory' to store the memmap for the memory being added.
In this case, it is useful to have a way to place the memmap on the
memory being added, even if it means splitting the addition into
memblock-sized chunks.

Change add_memory_resource() to loop over memblock-sized chunks of
memory if caller requested memmap_on_memory, and if other conditions for
it are met. Teach try_remove_memory() to also expect that a memory
range being removed might have been split up into memblock sized chunks,
and to loop through those as needed.

This does preclude being able to use PUD mappings in the direct map; a
proposal to how this could be optimized in the future is laid out
here[1].

[1]: 
https://lore.kernel.org/linux-mm/b6753402-2de9-25b2-36e9-eacd49752...@redhat.com/

Cc: Andrew Morton 
Cc: David Hildenbrand 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Dan Williams 
Cc: Dave Jiang 
Cc: Dave Hansen 
Cc: Huang Ying 
Suggested-by: David Hildenbrand 
Reviewed-by: Dan Williams 
Signed-off-by: Vishal Verma 
---
 mm/memory_hotplug.c | 213 ++--
 1 file changed, 138 insertions(+), 75 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6be7de9efa55..d242e49d7f7b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1380,6 +1380,84 @@ static bool mhp_supports_memmap_on_memory(unsigned long 
size)
return arch_supports_memmap_on_memory(vmemmap_size);
 }
 
+static void __ref remove_memory_blocks_and_altmaps(u64 start, u64 size)
+{
+   unsigned long memblock_size = memory_block_size_bytes();
+   u64 cur_start;
+
+   /*
+* For memmap_on_memory, the altmaps were added on a per-memblock
+* basis; we have to process each individual memory block.
+*/
+   for (cur_start = start; cur_start < start + size;
+cur_start += memblock_size) {
+   struct vmem_altmap *altmap = NULL;
+   struct memory_block *mem;
+
+   mem = find_memory_block(pfn_to_section_nr(PFN_DOWN(cur_start)));
+   WARN_ON_ONCE(!mem);
+   if (!mem)
+   continue;
+
+   altmap = mem->altmap;
+   mem->altmap = NULL;
+
+   remove_memory_block_devices(cur_start, memblock_size);
+
+   arch_remove_memory(cur_start, memblock_size, altmap);
+
+   /* Verify that all vmemmap pages have actually been freed. */
+   WARN(altmap->alloc, "Altmap not fully unmapped");
+   kfree(altmap);
+   }
+}
+
+static int create_altmaps_and_memory_blocks(int nid, struct memory_group 
*group,
+   u64 start, u64 size)
+{
+   unsigned long memblock_size = memory_block_size_bytes();
+   u64 cur_start;
+   int ret;
+
+   for (cur_start = start; cur_start < start + size;
+cur_start += memblock_size) {
+   struct mhp_params params = { .pgprot =
+pgprot_mhp(PAGE_KERNEL) };
+   struct vmem_altmap mhp_altmap = {
+   .base_pfn = PHYS_PFN(cur_start),
+   .end_pfn = PHYS_PFN(cur_start + memblock_size - 1),
+   };
+
+   mhp_altmap.free = memory_block_memmap_on_memory_pages();
+   params.altmap = kmemdup(_altmap, sizeof(struct vmem_altmap),
+   GFP_KERNEL);
+   if (!params.altmap)
+   return -ENOMEM;
+
+   /* call arch's memory hotadd */
+   ret = arch_add_memory(nid, cur_start, memblock_size, );
+   if (ret < 0) {
+   kfree(params.altmap);
+   goto out;
+   }
+
+   /* create memory block devices after memory was added */
+   ret = create_memory_block_devices(cur_start, memblock_size,
+ params.altmap, group);
+   if (ret) {
+   arch_remove_memory(cur_start, memblock_size, NULL);
+   kfree(params.altmap);
+   goto out;
+   }
+   }
+
+   return 0;
+out:
+   if (ret && (cur_start != start))
+   remove_memory_blocks_and_altmaps(start, cur_start - start);
+   return ret;
+}
+
 /*
  * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
  * and online/offline operations (triggered e.g. by sysfs).
@@ -1390,10 +1468,6 @@ int __ref add_memory_resource(int nid, struct resource 
*res, mhp_t mhp_flags)
 {
struct mhp_params 

[PATCH v8 3/3] dax/kmem: allow kmem to add memory with memmap_on_memory

2023-11-01 Thread Vishal Verma
Large amounts of memory managed by the kmem driver may come in via CXL,
and it is often desirable to have the memmap for this memory on the new
memory itself.

Enroll kmem-managed memory for memmap_on_memory semantics if the dax
region originates via CXL. For non-CXL dax regions, retain the existing
default behavior of hot adding without memmap_on_memory semantics.

Cc: Andrew Morton 
Cc: David Hildenbrand 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Dan Williams 
Cc: Dave Jiang 
Cc: Dave Hansen 
Cc: Huang Ying 
Reviewed-by: Jonathan Cameron 
Reviewed-by: David Hildenbrand 
Reviewed-by: Huang, Ying 
Signed-off-by: Vishal Verma 
---
 drivers/dax/bus.h | 1 +
 drivers/dax/dax-private.h | 1 +
 drivers/dax/bus.c | 3 +++
 drivers/dax/cxl.c | 1 +
 drivers/dax/hmem/hmem.c   | 1 +
 drivers/dax/kmem.c| 8 +++-
 drivers/dax/pmem.c| 1 +
 7 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index 1ccd23360124..cbbf64443098 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -23,6 +23,7 @@ struct dev_dax_data {
struct dev_pagemap *pgmap;
resource_size_t size;
int id;
+   bool memmap_on_memory;
 };
 
 struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data);
diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index 27cf2d79..446617b73aea 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -70,6 +70,7 @@ struct dev_dax {
struct ida ida;
struct device dev;
struct dev_pagemap *pgmap;
+   bool memmap_on_memory;
int nr_range;
struct dev_dax_range {
unsigned long pgoff;
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 0ee96e6fc426..ad9f821b8c78 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -367,6 +367,7 @@ static ssize_t create_store(struct device *dev, struct 
device_attribute *attr,
.dax_region = dax_region,
.size = 0,
.id = -1,
+   .memmap_on_memory = false,
};
struct dev_dax *dev_dax = devm_create_dev_dax();
 
@@ -1400,6 +1401,8 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data 
*data)
dev_dax->align = dax_region->align;
ida_init(_dax->ida);
 
+   dev_dax->memmap_on_memory = data->memmap_on_memory;
+
inode = dax_inode(dax_dev);
dev->devt = inode->i_rdev;
dev->bus = _bus_type;
diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
index 8bc9d04034d6..c696837ab23c 100644
--- a/drivers/dax/cxl.c
+++ b/drivers/dax/cxl.c
@@ -26,6 +26,7 @@ static int cxl_dax_region_probe(struct device *dev)
.dax_region = dax_region,
.id = -1,
.size = range_len(_dax->hpa_range),
+   .memmap_on_memory = true,
};
 
return PTR_ERR_OR_ZERO(devm_create_dev_dax());
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 5d2ddef0f8f5..b9da69f92697 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -36,6 +36,7 @@ static int dax_hmem_probe(struct platform_device *pdev)
.dax_region = dax_region,
.id = -1,
.size = region_idle ? 0 : range_len(>range),
+   .memmap_on_memory = false,
};
 
return PTR_ERR_OR_ZERO(devm_create_dev_dax());
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index c57acb73e3db..0aa6c45a4e5a 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "dax-private.h"
 #include "bus.h"
 
@@ -56,6 +57,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
unsigned long total_len = 0;
struct dax_kmem_data *data;
int i, rc, mapped = 0;
+   mhp_t mhp_flags;
int numa_node;
 
/*
@@ -136,12 +138,16 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
 */
res->flags = IORESOURCE_SYSTEM_RAM;
 
+   mhp_flags = MHP_NID_IS_MGID;
+   if (dev_dax->memmap_on_memory)
+   mhp_flags |= MHP_MEMMAP_ON_MEMORY;
+
/*
 * Ensure that future kexec'd kernels will not treat
 * this as RAM automatically.
 */
rc = add_memory_driver_managed(data->mgid, range.start,
-   range_len(), kmem_name, MHP_NID_IS_MGID);
+   range_len(), kmem_name, mhp_flags);
 
if (rc) {
dev_warn(dev, "mapping%d: %#llx-%#llx memory add 
failed\n",
diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index ae0cb113a5d3..f3c6c67b8412 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -63,6 +63,7 @@ static struct dev_dax *__dax_pmem_probe(struct device *dev)
.id = id,
.pgmap = ,

[PATCH v8 1/3] mm/memory_hotplug: replace an open-coded kmemdup() in add_memory_resource()

2023-11-01 Thread Vishal Verma
A review of the memmap_on_memory modifications to add_memory_resource()
revealed an instance of an open-coded kmemdup(). Replace it with
kmemdup().

Cc: Andrew Morton 
Cc: David Hildenbrand 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Dan Williams 
Reported-by: Dan Williams 
Reviewed-by: David Hildenbrand 
Signed-off-by: Vishal Verma 
---
 mm/memory_hotplug.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f8d3e7427e32..6be7de9efa55 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1439,11 +1439,11 @@ int __ref add_memory_resource(int nid, struct resource 
*res, mhp_t mhp_flags)
if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
if (mhp_supports_memmap_on_memory(size)) {
mhp_altmap.free = memory_block_memmap_on_memory_pages();
-   params.altmap = kmalloc(sizeof(struct vmem_altmap), 
GFP_KERNEL);
+   params.altmap = kmemdup(_altmap,
+   sizeof(struct vmem_altmap),
+   GFP_KERNEL);
if (!params.altmap)
goto error;
-
-   memcpy(params.altmap, _altmap, sizeof(mhp_altmap));
}
/* fallback to not using altmap  */
}

-- 
2.41.0




[GIT PULL] Modules changes for v6.7-rc1

2023-11-01 Thread Luis Chamberlain
The following changes since commit 7d461b291e65938f15f56fe58da2303b07578a76:

  Merge tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm 
(2023-11-01 06:28:35 -1000)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/ 
tags/modules-6.7-rc1

for you to fetch changes up to ea0b0bcef4917a2640ecc100c768b8e785784834:

  module: Annotate struct module_notes_attrs with __counted_by (2023-11-01 
13:07:32 -0700)


Modules changes for v6.7-rc1

The only thing worth highligthing is that gzip moves to use vmalloc() instead of
kmalloc just as we had a fix for this for zstd on v6.6-rc1. The rest is regular
house keeping, keeping things neat, tidy, and boring.

Oh and this has been on linux-next for over a month.


Andrea Righi (1):
  module/decompress: use vmalloc() for gzip decompression workspace

Kees Cook (2):
  module: Clarify documentation of module_param_call()
  module: Annotate struct module_notes_attrs with __counted_by

Luis Chamberlain (1):
  MAINTAINERS: add include/linux/module*.h to modules

Tiezhu Yang (2):
  module: Make is_mapping_symbol() return bool
  module: Make is_valid_name() return bool

Zhu Mao (1):
  module: Fix comment typo

 MAINTAINERS   | 2 +-
 include/linux/module_symbol.h | 2 +-
 include/linux/moduleparam.h   | 6 +-
 kernel/module/decompress.c| 4 ++--
 kernel/module/stats.c | 2 +-
 kernel/module/sysfs.c | 2 +-
 scripts/mod/modpost.c | 4 ++--
 7 files changed, 13 insertions(+), 9 deletions(-)



Re: [PATCH] eventfs: Fix kerneldoc of eventfs_remove_rec()

2023-11-01 Thread Steven Rostedt
On Mon, 30 Oct 2023 21:57:13 +0530
Mukesh Ojha  wrote:

> On 10/30/2023 9:45 PM, Steven Rostedt wrote:
> > From: "Steven Rostedt (Google)" 
> > 
> > The eventfs_remove_rec() had some missing parameters in the kerneldoc
> > comment above it. Also, rephrase the description a bit more to have a bit
> > more correct grammar.
> > 
> > Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use 
> > eventfs_inode");
> > Reported-by: kernel test robot 
> > Closes: 
> > https://lore.kernel.org/oe-kbuild-all/202310052216.4sgqaswo-...@intel.com/
> > Signed-off-by: Steven Rostedt (Google)   
> 
> Reviewed-by: Mukesh Ojha 

Hi Mukesh!

First, I want to thank you for your reviews. We certainly need more
reviewers.

But I need to also state that "Reviewed-by" tags should not be sent so
lightly. The only times a Reviewed-by tag should be sent is if you
participated in the discussion of the code, you have authored some of
the code that is being modified, or are marked as a reviewer of the code in
the MAINTAINERS file.

For example, you added to the discussion here:

   https://lore.kernel.org/all/65dcdd9c-a75b-4fe7-bdcf-471a5602d...@linaro.org/

And adding your Reviewed-by tag is appropriate.

But when a maintainer receives a Reviewed-by from someone they don't know,
without any discussion in the patch, it may make that maintainer think that
the person sending the Reviewed-by is only out to get listed in the LWN
"Reviewed-by" count.

I review other developers' code all the time, and unless the code touches
something I worked on or I'm marked as a reviewer in the MAINTAINERS file,
I do not send a Reviewed-by tag unless I added some input to the patch in
question.

My advice to you is to keep up the reviewing, I appreciate it (I really
do!), but don't send out Reviewed-by tags unless you are marked as a
reviewer of the code, or participated in a discussion on that code.

Thanks,

-- Steve



Re: [QUESTION] ftrace_test_recursion_trylock behaviour

2023-11-01 Thread Steven Rostedt
On Wed, 1 Nov 2023 18:32:14 +0100
Jiri Olsa  wrote:

> hi,
> I'm doing some testing on top of fprobes and noticed that the
> ftrace_test_recursion_trylock allows caller from the same context
> going through twice.
> 
> The change below adds extra fprobe on stack_trace_print, which is
> called within the sample_entry_handler and I can see it being executed
> with following trace output:
> 
><...>-457 [003] ...1.32.352554: sample_entry_handler:
> Enter  ip = 0x81177420 <...>-457
> [003] ...2.32.352578: sample_entry_handler_extra: Enter
>  ip = 0x8127ae70
> 
> IOW nested ftrace_test_recursion_trylock call in the same context
> succeeded.
> 
> It seems the reason is the TRACE_CTX_TRANSITION bit logic.
> 
> Just making sure it's intentional.. we have kprobe_multi code on top of
> fprobe with another re-entry logic and that might behave differently based
> on ftrace_test_recursion_trylock logic.

Yes it's intentional, as it's a work around for an issue that may be
cleared up now with Peter Zijlstra's noinstr updates.

The use case for that TRACE_CTX_TRANSITION is when a function is traced
just after an interrupt was triggered but before the preempt count was
updated to let us know that we are in an interrupt context.

Daniel Bristot reported a regression after the trylock was first introduced
where the interrupt entry function was traced sometimes but not always.
That's because if the interrupt happened normally, it would be traced, but
if the interrupt happened when another event was being traced, the recursion
logic would see that the trace of the interrupt was happening in the same
context as the event it interrupted and drop the interrupt trace. But after
the preempt count was updated, the other functions in the interrupt would be
seen. This led to very confusing trace output.

The solution to that was this workaround hack, where the trace recursion
logic would allow a single recursion (the interrupt preempting another
trace before it set preempt count).

But with noinstr, there should be no more instances of this problem and we
can drop that extra bit. But the last I checked, there were a few places
that still could be traced without the preempt_count set. I'll have to
re-investigate.

-- Steve




[QUESTION] ftrace_test_recursion_trylock behaviour

2023-11-01 Thread Jiri Olsa
hi,
I'm doing some testing on top of fprobes and noticed that the
ftrace_test_recursion_trylock allows caller from the same context
going through twice.

The change below adds extra fprobe on stack_trace_print, which is
called within the sample_entry_handler and I can see it being executed
with following trace output:

   <...>-457 [003] ...1.32.352554: sample_entry_handler: Enter 
 ip = 0x81177420
   <...>-457 [003] ...2.32.352578: sample_entry_handler_extra: 
Enter  ip = 0x8127ae70

IOW nested ftrace_test_recursion_trylock call in the same context succeeded.

It seems the reason is the TRACE_CTX_TRANSITION bit logic.

Just making sure it's intentional.. we have kprobe_multi code on top of
fprobe with another re-entry logic and that might behave differently based
on ftrace_test_recursion_trylock logic.

thanks,
jirka


---
diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c
index 64e715e7ed11..531272af0682 100644
--- a/samples/fprobe/fprobe_example.c
+++ b/samples/fprobe/fprobe_example.c
@@ -23,6 +23,9 @@
 static struct fprobe sample_probe;
 static unsigned long nhit;
 
+static struct fprobe sample_probe_extra;
+static char *symbol_extra = "stack_trace_print";
+
 static char symbol[MAX_SYMBOL_LEN] = "kernel_clone";
 module_param_string(symbol, symbol, sizeof(symbol), 0644);
 MODULE_PARM_DESC(symbol, "Probed symbol(s), given by comma separated symbols 
or a wildcard pattern.");
@@ -48,6 +51,15 @@ static void show_backtrace(void)
stack_trace_print(stacks, len, 24);
 }
 
+static int sample_entry_handler_extra(struct fprobe *fp, unsigned long ip,
+ unsigned long ret_ip,
+ struct pt_regs *regs, void *data)
+{
+   if (use_trace)
+   trace_printk("Enter <%pS> ip = 0x%p\n", (void *)ip, (void *)ip);
+   return 0;
+}
+
 static int sample_entry_handler(struct fprobe *fp, unsigned long ip,
unsigned long ret_ip,
struct pt_regs *regs, void *data)
@@ -96,6 +108,8 @@ static int __init fprobe_init(void)
sample_probe.entry_handler = sample_entry_handler;
sample_probe.exit_handler = sample_exit_handler;
 
+   sample_probe_extra.entry_handler = sample_entry_handler_extra;
+
if (strchr(symbol, '*')) {
/* filter based fprobe */
ret = register_fprobe(_probe, symbol,
@@ -137,12 +151,19 @@ static int __init fprobe_init(void)
else
pr_info("Planted fprobe at %s\n", symbol);
 
+   ret = register_fprobe_syms(_probe_extra, (const char **) 
_extra, 1);
+   if (ret < 0)
+   pr_err("register_fprobe extra failed, returned %d\n", ret);
+   else
+   pr_info("Planted extra fprobe at %s\n", symbol_extra);
+
return ret;
 }
 
 static void __exit fprobe_exit(void)
 {
unregister_fprobe(_probe);
+   unregister_fprobe(_probe_extra);
 
pr_info("fprobe at %s unregistered. %ld times hit, %ld times missed\n",
symbol, nhit, sample_probe.nmissed);



[PATCH v6 8/8] eventfs: Use simple_recursive_removal() to clean up dentries

2023-11-01 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

Looking at how dentry is removed via the tracefs system, I found that
eventfs does not do everything that it did under tracefs. The tracefs
removal of a dentry calls simple_recursive_removal() that does a lot more
than a simple d_invalidate().

As it should be a requirement that any eventfs_inode that has a dentry, so
does its parent. When removing a eventfs_inode, if it has a dentry, a call
to simple_recursive_removal() on that dentry should clean up all the
dentries underneath it.

Add WARN_ON_ONCE() to check for the parent having a dentry if any children
do.

Link: https://lore.kernel.org/all/20231101022553.GE1957730@ZenIV/

Cc: sta...@vger.kernel.org
Cc: Al Viro 
Fixes: 5bdcd5f5331a2 ("eventfs: Implement removal of meta data from eventfs")
Signed-off-by: Steven Rostedt (Google) 
---
Changes since the last patch: 
https://lore.kernel.org/linux-trace-kernel/20231031144703.71eef...@gandalf.local.home

- Was originally called: eventfs: Process deletion of dentry more thoroughly

- Al Viro pointed out that I could use simple_recursive_removal() instead.
  I had originally thought that I could not, but looking deeper into it,
  and realizing that if a dentry exists on any eventfs_inode, then all
  the parent eventfs_inode of that would als have a dentry. Hence, calling
  simple_recursive_removal() on the top dentry would clean up all the
  children dentries as well. Doing it his way cleans up the code quite
  a bit!

 fs/tracefs/event_inode.c | 77 +++-
 fs/tracefs/internal.h|  2 --
 2 files changed, 44 insertions(+), 35 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 0087a3f455f1..f8a594a50ae6 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -967,30 +967,29 @@ static void unhook_dentry(struct dentry *dentry)
 {
if (!dentry)
return;
-
-   /* Keep the dentry from being freed yet (see eventfs_workfn()) */
+   /*
+* Need to add a reference to the dentry that is expected by
+* simple_recursive_removal(), which will include a dput().
+*/
dget(dentry);
 
-   dentry->d_fsdata = NULL;
-   d_invalidate(dentry);
-   mutex_lock(_mutex);
-   /* dentry should now have at least a single reference */
-   WARN_ONCE((int)d_count(dentry) < 1,
- "dentry %px (%s) less than one reference (%d) after 
invalidate\n",
- dentry, dentry->d_name.name, d_count(dentry));
-   mutex_unlock(_mutex);
+   /*
+* Also add a reference for the dput() in eventfs_workfn().
+* That is required as that dput() will free the ei after
+* the SRCU grace period is over.
+*/
+   dget(dentry);
 }
 
 /**
  * eventfs_remove_rec - remove eventfs dir or file from list
  * @ei: eventfs_inode to be removed.
- * @head: the list head to place the deleted @ei and children
  * @level: prevent recursion from going more than 3 levels deep.
  *
  * This function recursively removes eventfs_inodes which
  * contains info of files and/or directories.
  */
-static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head 
*head, int level)
+static void eventfs_remove_rec(struct eventfs_inode *ei, int level)
 {
struct eventfs_inode *ei_child;
 
@@ -1009,13 +1008,26 @@ static void eventfs_remove_rec(struct eventfs_inode 
*ei, struct list_head *head,
/* search for nested folders or files */
list_for_each_entry_srcu(ei_child, >children, list,
 lockdep_is_held(_mutex)) {
-   eventfs_remove_rec(ei_child, head, level + 1);
+   /* Children only have dentry if parent does */
+   WARN_ON_ONCE(ei_child->dentry && !ei->dentry);
+   eventfs_remove_rec(ei_child, level + 1);
}
 
+
ei->is_freed = 1;
 
+   for (int i = 0; i < ei->nr_entries; i++) {
+   if (ei->d_children[i]) {
+   /* Children only have dentry if parent does */
+   WARN_ON_ONCE(!ei->dentry);
+   unhook_dentry(ei->d_children[i]);
+   }
+   }
+
+   unhook_dentry(ei->dentry);
+
list_del_rcu(>list);
-   list_add_tail(>del_list, head);
+   call_srcu(_srcu, >rcu, free_rcu_ei);
 }
 
 /**
@@ -1026,30 +1038,22 @@ static void eventfs_remove_rec(struct eventfs_inode 
*ei, struct list_head *head,
  */
 void eventfs_remove_dir(struct eventfs_inode *ei)
 {
-   struct eventfs_inode *tmp;
-   LIST_HEAD(ei_del_list);
+   struct dentry *dentry;
 
if (!ei)
return;
 
-   /*
-* Move the deleted eventfs_inodes onto the ei_del_list
-* which will also set the is_freed value. Note, this has to be
-* done under the eventfs_mutex, but the deletions of
-* the dentries must be done outside the eventfs_mutex.
-* Hence moving them to this temporary 

[PATCH v6 4/8] eventfs: Save ownership and mode

2023-11-01 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

Now that inodes and dentries are created on the fly, they are also
reclaimed on memory pressure. Since the ownership and file mode are saved
in the inode, if they are freed, any changes to the ownership and mode
will be lost.

To counter this, if the user changes the permissions or ownership, save
them, and when creating the inodes again, restore those changes.

Cc: Ajay Kaher 
Signed-off-by: Steven Rostedt (Google) 
---
Changes since v5: https://lkml.kernel.org/r/20231031223420.568912...@goodmis.org

- Resynced to this patch series

 fs/tracefs/event_inode.c | 148 +++
 fs/tracefs/internal.h|  16 +
 2 files changed, 151 insertions(+), 13 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index e9625732c52d..93d08e552483 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -40,6 +40,15 @@ static DEFINE_MUTEX(eventfs_mutex);
  */
 DEFINE_STATIC_SRCU(eventfs_srcu);
 
+/* Mode is unsigned short, use the upper bits for flags */
+enum {
+   EVENTFS_SAVE_MODE   = BIT(16),
+   EVENTFS_SAVE_UID= BIT(17),
+   EVENTFS_SAVE_GID= BIT(18),
+};
+
+#define EVENTFS_MODE_MASK  (EVENTFS_SAVE_MODE - 1)
+
 static struct dentry *eventfs_root_lookup(struct inode *dir,
  struct dentry *dentry,
  unsigned int flags);
@@ -47,8 +56,89 @@ static int dcache_dir_open_wrapper(struct inode *inode, 
struct file *file);
 static int dcache_readdir_wrapper(struct file *file, struct dir_context *ctx);
 static int eventfs_release(struct inode *inode, struct file *file);
 
+static void update_attr(struct eventfs_attr *attr, struct iattr *iattr)
+{
+   unsigned int ia_valid = iattr->ia_valid;
+
+   if (ia_valid & ATTR_MODE) {
+   attr->mode = (attr->mode & ~EVENTFS_MODE_MASK) |
+   (iattr->ia_mode & EVENTFS_MODE_MASK) |
+   EVENTFS_SAVE_MODE;
+   }
+   if (ia_valid & ATTR_UID) {
+   attr->mode |= EVENTFS_SAVE_UID;
+   attr->uid = iattr->ia_uid;
+   }
+   if (ia_valid & ATTR_GID) {
+   attr->mode |= EVENTFS_SAVE_GID;
+   attr->gid = iattr->ia_gid;
+   }
+}
+
+static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry,
+   struct iattr *iattr)
+{
+   const struct eventfs_entry *entry;
+   struct eventfs_inode *ei;
+   const char *name;
+   int ret;
+
+   mutex_lock(_mutex);
+   ei = dentry->d_fsdata;
+   /* The LSB is set when the eventfs_inode is being freed */
+   if (((unsigned long)ei & 1UL) || ei->is_freed) {
+   /* Do not allow changes if the event is about to be removed. */
+   mutex_unlock(_mutex);
+   return -ENODEV;
+   }
+
+   /* Preallocate the children mode array if necessary */
+   if (!(dentry->d_inode->i_mode & S_IFDIR)) {
+   if (!ei->entry_attrs) {
+   ei->entry_attrs = kzalloc(sizeof(*ei->entry_attrs) * 
ei->nr_entries,
+ GFP_KERNEL);
+   if (!ei->entry_attrs) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   }
+   }
+
+   ret = simple_setattr(idmap, dentry, iattr);
+   if (ret < 0)
+   goto out;
+
+   /*
+* If this is a dir, then update the ei cache, only the file
+* mode is saved in the ei->m_children, and the ownership is
+* determined by the parent directory.
+*/
+   if (dentry->d_inode->i_mode & S_IFDIR) {
+   update_attr(>attr, iattr);
+
+   } else {
+   name = dentry->d_name.name;
+
+   for (int i = 0; i < ei->nr_entries; i++) {
+   entry = >entries[i];
+   if (strcmp(name, entry->name) == 0) {
+   update_attr(>entry_attrs[i], iattr);
+   break;
+   }
+   }
+   }
+ out:
+   mutex_unlock(_mutex);
+   return ret;
+}
+
 static const struct inode_operations eventfs_root_dir_inode_operations = {
.lookup = eventfs_root_lookup,
+   .setattr= eventfs_set_attr,
+};
+
+static const struct inode_operations eventfs_file_inode_operations = {
+   .setattr= eventfs_set_attr,
 };
 
 static const struct file_operations eventfs_file_operations = {
@@ -59,10 +149,30 @@ static const struct file_operations 
eventfs_file_operations = {
.release= eventfs_release,
 };
 
+static void update_inode_attr(struct inode *inode, struct eventfs_attr *attr, 
umode_t mode)
+{
+   if (!attr) {
+   inode->i_mode = mode;
+   return;
+   }
+
+   if (attr->mode & 

[PATCH v6 5/8] eventfs: Hold eventfs_mutex when calling callback functions

2023-11-01 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

The callback function that is used to create inodes and dentries is not
protected by anything and the data that is passed to it could become
stale. After eventfs_remove_dir() is called by the tracing system, it is
free to remove the events that are associated to that directory.
Unfortunately, that means the callbacks must not be called after that.

 CPU0   CPU1
    
 eventfs_root_lookup() {
 eventfs_remove_dir() {
  mutex_lock(_mutex);
  ei->is_freed = set;
  mutex_unlock(_mutex);
 }
 kfree(event_call);

for (...) {
  entry = >entries[i];
  r = entry->callback() {
  call = data;  // call == event_call above
  if (call->flags ...)

 [ USE AFTER FREE BUG ]

The safest way to protect this is to wrap the callback with:

 mutex_lock(_mutex);
 if (!ei->is_freed)
 r = entry->callback();
 else
 r = -1;
 mutex_unlock(_mutex);

This will make sure that the callback will not be called after it is
freed. But now it needs to be known that the callback is called while
holding internal eventfs locks, and that it must not call back into the
eventfs / tracefs system. There's no reason it should anyway, but document
that as well.

Link: 
https://lore.kernel.org/all/CA+G9fYu9GOEbD=rR5eMR-=HJ8H6rMsbzDC2ZY5=y50wpwae...@mail.gmail.com/

Cc: Ajay Kaher 
Reported-by: Linux Kernel Functional Testing 
Reported-by: Naresh Kamboju 
Tested-by: Linux Kernel Functional Testing 
Tested-by: Naresh Kamboju 
Signed-off-by: Steven Rostedt (Google) 
---
Changes since v5: https://lkml.kernel.org/r/20231031223420.778161...@goodmis.org

- Resynced to this patch series

 fs/tracefs/event_inode.c | 22 ++--
 include/linux/tracefs.h  | 43 
 2 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 93d08e552483..8ac9abf7a3d5 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -615,7 +615,13 @@ static struct dentry *eventfs_root_lookup(struct inode 
*dir,
entry = >entries[i];
if (strcmp(name, entry->name) == 0) {
void *cdata = data;
-   r = entry->callback(name, , , );
+   mutex_lock(_mutex);
+   /* If ei->is_freed, then the event itself may be too */
+   if (!ei->is_freed)
+   r = entry->callback(name, , , );
+   else
+   r = -1;
+   mutex_unlock(_mutex);
if (r <= 0)
continue;
ret = simple_lookup(dir, dentry, flags);
@@ -749,7 +755,13 @@ static int dcache_dir_open_wrapper(struct inode *inode, 
struct file *file)
void *cdata = data;
entry = >entries[i];
name = entry->name;
-   r = entry->callback(name, , , );
+   mutex_lock(_mutex);
+   /* If ei->is_freed, then the event itself may be too */
+   if (!ei->is_freed)
+   r = entry->callback(name, , , );
+   else
+   r = -1;
+   mutex_unlock(_mutex);
if (r <= 0)
continue;
d = create_file_dentry(ei, i, parent, name, mode, cdata, fops, 
false);
@@ -819,6 +831,10 @@ static int dcache_readdir_wrapper(struct file *file, 
struct dir_context *ctx)
  *   data = A pointer to @data, and the callback may replace it, which will
  * cause the file created to pass the new data to the open() call.
  *   fops = the fops to use for the created file.
+ *
+ * NB. @callback is called while holding internal locks of the eventfs
+ * system. The callback must not call any code that might also call into
+ * the tracefs or eventfs system or it will risk creating a deadlock.
  */
 struct eventfs_inode *eventfs_create_dir(const char *name, struct 
eventfs_inode *parent,
 const struct eventfs_entry *entries,
@@ -878,6 +894,8 @@ struct eventfs_inode *eventfs_create_dir(const char *name, 
struct eventfs_inode
  * @data: The default data to pass to the files (an entry may override it).
  *
  * This function creates the top of the trace event directory.
+ *
+ * See eventfs_create_dir() for use of @entries.
  */
 struct eventfs_inode *eventfs_create_events_dir(const char *name, struct 
dentry *parent,
const struct eventfs_entry 
*entries,
diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
index 13359b1a35d1..7a5fe17b6bf9 

[PATCH v6 6/8] eventfs: Delete eventfs_inode when the last dentry is freed

2023-11-01 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

There exists a race between holding a reference of an eventfs_inode dentry
and the freeing of the eventfs_inode. If user space has a dentry held long
enough, it may still be able to access the dentry's eventfs_inode after it
has been freed.

To prevent this, have he eventfs_inode freed via the last dput() (or via
RCU if the eventfs_inode does not have a dentry).

This means reintroducing the eventfs_inode del_list field at a temporary
place to put the eventfs_inode. It needs to mark it as freed (via the
list) but also must invalidate the dentry immediately as the return from
eventfs_remove_dir() expects that they are. But the dentry invalidation
must not be called under the eventfs_mutex, so it must be done after the
eventfs_inode is marked as free (put on a deletion list).

Cc: sta...@vger.kernel.org
Cc: Ajay Kaher 
Fixes: 5bdcd5f5331a2 ("eventfs: Implement removal of meta data from eventfs")
Signed-off-by: Steven Rostedt (Google) 
---
Changes since v5: https://lkml.kernel.org/r/20231031223420.988874...@goodmis.org

- Rebased for this patch series

 fs/tracefs/event_inode.c | 146 ++-
 fs/tracefs/internal.h|   2 +
 2 files changed, 69 insertions(+), 79 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 8ac9abf7a3d5..0a04ae0ca8c8 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -85,8 +85,7 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct 
dentry *dentry,
 
mutex_lock(_mutex);
ei = dentry->d_fsdata;
-   /* The LSB is set when the eventfs_inode is being freed */
-   if (((unsigned long)ei & 1UL) || ei->is_freed) {
+   if (ei->is_freed) {
/* Do not allow changes if the event is about to be removed. */
mutex_unlock(_mutex);
return -ENODEV;
@@ -276,35 +275,17 @@ static void free_ei(struct eventfs_inode *ei)
 void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry 
*dentry)
 {
struct tracefs_inode *ti_parent;
-   struct eventfs_inode *ei_child, *tmp;
struct eventfs_inode *ei;
int i;
 
/* The top level events directory may be freed by this */
if (unlikely(ti->flags & TRACEFS_EVENT_TOP_INODE)) {
-   LIST_HEAD(ef_del_list);
-
mutex_lock(_mutex);
-
ei = ti->private;
-
-   /* Record all the top level files */
-   list_for_each_entry_srcu(ei_child, >children, list,
-lockdep_is_held(_mutex)) {
-   list_add_tail(_child->del_list, _del_list);
-   }
-
/* Nothing should access this, but just in case! */
ti->private = NULL;
-
mutex_unlock(_mutex);
 
-   /* Now safely free the top level files and their children */
-   list_for_each_entry_safe(ei_child, tmp, _del_list, del_list) 
{
-   list_del(_child->del_list);
-   eventfs_remove_dir(ei_child);
-   }
-
free_ei(ei);
return;
}
@@ -319,14 +300,6 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, 
struct dentry *dentry)
if (!ei)
goto out;
 
-   /*
-* If ei was freed, then the LSB bit is set for d_fsdata.
-* But this should not happen, as it should still have a
-* ref count that prevents it. Warn in case it does.
-*/
-   if (WARN_ON_ONCE((unsigned long)ei & 1))
-   goto out;
-
/* This could belong to one of the files of the ei */
if (ei->dentry != dentry) {
for (i = 0; i < ei->nr_entries; i++) {
@@ -336,6 +309,8 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, 
struct dentry *dentry)
if (WARN_ON_ONCE(i == ei->nr_entries))
goto out;
ei->d_children[i] = NULL;
+   } else if (ei->is_freed) {
+   free_ei(ei);
} else {
ei->dentry = NULL;
}
@@ -962,13 +937,65 @@ struct eventfs_inode *eventfs_create_events_dir(const 
char *name, struct dentry
return ERR_PTR(-ENOMEM);
 }
 
+static LLIST_HEAD(free_list);
+
+static void eventfs_workfn(struct work_struct *work)
+{
+struct eventfs_inode *ei, *tmp;
+struct llist_node *llnode;
+
+   llnode = llist_del_all(_list);
+llist_for_each_entry_safe(ei, tmp, llnode, llist) {
+   /* This dput() matches the dget() from unhook_dentry() */
+   for (int i = 0; i < ei->nr_entries; i++) {
+   if (ei->d_children[i])
+   dput(ei->d_children[i]);
+   }
+   /* This should only get here if it had a dentry */
+   if (!WARN_ON_ONCE(!ei->dentry))
+   dput(ei->dentry);
+}
+}
+
+static 

[PATCH v6 3/8] eventfs: Test for ei->is_freed when accessing ei->dentry

2023-11-01 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

The eventfs_inode (ei) is protected by SRCU, but the ei->dentry is not. It
is protected by the eventfs_mutex. Anytime the eventfs_mutex is released,
and access to the ei->dentry needs to be done, it should first check if
ei->is_freed is set under the eventfs_mutex. If it is, then the ei->dentry
is invalid and must not be used. The ei->dentry must only be accessed
under the eventfs_mutex and after checking if ei->is_freed is set.

When the ei is being freed, it will (under the eventfs_mutex) set is_freed
and at the same time move the dentry to a free list to be cleared after
the eventfs_mutex is released. This means that any access to the
ei->dentry must check first if ei->is_freed is set, because if it is, then
the dentry is on its way to be freed.

Also add comments to describe this better.

Link: 
https://lore.kernel.org/all/CA+G9fYt6pY+tMZEOg=soeywqoe19fgp3ur15sgowkdk+_x8...@mail.gmail.com/
Link: 
https://lore.kernel.org/all/CA+G9fYuDP3hVQ3t7FfrBAjd_WFVSurMgCepTxunSJf=MTe=6...@mail.gmail.com/

Cc: Ajay Kaher 
Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reported-by: Linux Kernel Functional Testing 
Reported-by: Naresh Kamboju 
Reported-by: Beau Belgrave 
Reviewed-by: Masami Hiramatsu (Google) 
Tested-by: Linux Kernel Functional Testing 
Tested-by: Naresh Kamboju 
Tested-by: Beau Belgrave 
Signed-off-by: Steven Rostedt (Google) 
---
Changes since v5: https://lkml.kernel.org/r/20231031223420.347714...@goodmis.org

- Resynced to this patch series

 fs/tracefs/event_inode.c | 45 ++--
 fs/tracefs/internal.h|  3 ++-
 2 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index dd5971855732..e9625732c52d 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -24,7 +24,20 @@
 #include 
 #include "internal.h"
 
+/*
+ * eventfs_mutex protects the eventfs_inode (ei) dentry. Any access
+ * to the ei->dentry must be done under this mutex and after checking
+ * if ei->is_freed is not set. The ei->dentry is released under the
+ * mutex at the same time ei->is_freed is set. If ei->is_freed is set
+ * then the ei->dentry is invalid.
+ */
 static DEFINE_MUTEX(eventfs_mutex);
+
+/*
+ * The eventfs_inode (ei) itself is protected by SRCU. It is released from
+ * its parent's list and will have is_freed set (under eventfs_mutex).
+ * After the SRCU grace period is over, the ei may be freed.
+ */
 DEFINE_STATIC_SRCU(eventfs_srcu);
 
 static struct dentry *eventfs_root_lookup(struct inode *dir,
@@ -239,6 +252,10 @@ create_file_dentry(struct eventfs_inode *ei, struct dentry 
**e_dentry,
bool invalidate = false;
 
mutex_lock(_mutex);
+   if (ei->is_freed) {
+   mutex_unlock(_mutex);
+   return NULL;
+   }
/* If the e_dentry already has a dentry, use it */
if (*e_dentry) {
/* lookup does not need to up the ref count */
@@ -312,6 +329,8 @@ static void eventfs_post_create_dir(struct eventfs_inode 
*ei)
struct eventfs_inode *ei_child;
struct tracefs_inode *ti;
 
+   lockdep_assert_held(_mutex);
+
/* srcu lock already held */
/* fill parent-child relation */
list_for_each_entry_srcu(ei_child, >children, list,
@@ -325,6 +344,7 @@ static void eventfs_post_create_dir(struct eventfs_inode 
*ei)
 
 /**
  * create_dir_dentry - Create a directory dentry for the eventfs_inode
+ * @pei: The eventfs_inode parent of ei.
  * @ei: The eventfs_inode to create the directory for
  * @parent: The dentry of the parent of this directory
  * @lookup: True if this is called by the lookup code
@@ -332,12 +352,17 @@ static void eventfs_post_create_dir(struct eventfs_inode 
*ei)
  * This creates and attaches a directory dentry to the eventfs_inode @ei.
  */
 static struct dentry *
-create_dir_dentry(struct eventfs_inode *ei, struct dentry *parent, bool lookup)
+create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei,
+ struct dentry *parent, bool lookup)
 {
bool invalidate = false;
struct dentry *dentry = NULL;
 
mutex_lock(_mutex);
+   if (pei->is_freed || ei->is_freed) {
+   mutex_unlock(_mutex);
+   return NULL;
+   }
if (ei->dentry) {
/* If the dentry already has a dentry, use it */
dentry = ei->dentry;
@@ -440,7 +465,7 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
 */
mutex_lock(_mutex);
ei = READ_ONCE(ti->private);
-   if (ei)
+   if (ei && !ei->is_freed)
ei_dentry = READ_ONCE(ei->dentry);
mutex_unlock(_mutex);
 
@@ -454,7 +479,7 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
if (strcmp(ei_child->name, name) != 0)
continue;
ret = simple_lookup(dir, dentry, flags);
-

[PATCH v6 2/8] eventfs: Have a free_ei() that just frees the eventfs_inode

2023-11-01 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

As the eventfs_inode is freed in two different locations, make a helper
function free_ei() to make sure all the allocated fields of the
eventfs_inode is freed.

This requires renaming the existing free_ei() which is called by the srcu
handler to free_rcu_ei() and have free_ei() just do the freeing, where
free_rcu_ei() will call it.

Cc: Ajay Kaher 
Reviewed-by: Masami Hiramatsu (Google) 
Signed-off-by: Steven Rostedt (Google) 
---
Changes since v5: https://lkml.kernel.org/r/20231031223420.141626...@goodmis.org

- Resynced to this patch series

 fs/tracefs/event_inode.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 1ce73acf3df0..dd5971855732 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -129,6 +129,13 @@ static struct dentry *create_dir(const char *name, struct 
dentry *parent)
return eventfs_end_creating(dentry);
 }
 
+static void free_ei(struct eventfs_inode *ei)
+{
+   kfree_const(ei->name);
+   kfree(ei->d_children);
+   kfree(ei);
+}
+
 /**
  * eventfs_set_ei_status_free - remove the dentry reference from an 
eventfs_inode
  * @ti: the tracefs_inode of the dentry
@@ -168,9 +175,7 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, 
struct dentry *dentry)
eventfs_remove_dir(ei_child);
}
 
-   kfree_const(ei->name);
-   kfree(ei->d_children);
-   kfree(ei);
+   free_ei(ei);
return;
}
 
@@ -784,13 +789,11 @@ struct eventfs_inode *eventfs_create_events_dir(const 
char *name, struct dentry
return ERR_PTR(-ENOMEM);
 }
 
-static void free_ei(struct rcu_head *head)
+static void free_rcu_ei(struct rcu_head *head)
 {
struct eventfs_inode *ei = container_of(head, struct eventfs_inode, 
rcu);
 
-   kfree_const(ei->name);
-   kfree(ei->d_children);
-   kfree(ei);
+   free_ei(ei);
 }
 
 /**
@@ -881,7 +884,7 @@ void eventfs_remove_dir(struct eventfs_inode *ei)
for (i = 0; i < ei->nr_entries; i++)
unhook_dentry(>d_children[i], _list);
unhook_dentry(>dentry, _list);
-   call_srcu(_srcu, >rcu, free_ei);
+   call_srcu(_srcu, >rcu, free_rcu_ei);
}
mutex_unlock(_mutex);
 
-- 
2.42.0



[PATCH v6 1/8] eventfs: Remove "is_freed" union with rcu head

2023-11-01 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

The eventfs_inode->is_freed was a union with the rcu_head with the
assumption that when it was on the srcu list the head would contain a
pointer which would make "is_freed" true. But that was a wrong assumption
as the rcu head is a single link list where the last element is NULL.

Instead, split the nr_entries integer so that "is_freed" is one bit and
the nr_entries is the next 31 bits. As there shouldn't be more than 10
(currently there's at most 5 to 7 depending on the config), this should
not be a problem.

Cc: sta...@vger.kernel.org
Cc: Ajay Kaher 
Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open functions")
Reviewed-by: Masami Hiramatsu (Google) 
Signed-off-by: Steven Rostedt (Google) 
---
Changes since v5: https://lkml.kernel.org/r/20231031223419.935276...@goodmis.org

- Resynced for this patch series

 fs/tracefs/event_inode.c | 2 ++
 fs/tracefs/internal.h| 6 +++---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 9f612a8f009d..1ce73acf3df0 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -824,6 +824,8 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, 
struct list_head *head,
eventfs_remove_rec(ei_child, head, level + 1);
}
 
+   ei->is_freed = 1;
+
list_del_rcu(>list);
list_add_tail(>del_list, head);
 }
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 64fde9490f52..c7d88aaa949f 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -23,6 +23,7 @@ struct tracefs_inode {
  * @d_parent:   pointer to the parent's dentry
  * @d_children: The array of dentries to represent the files when created
  * @data:  The private data to pass to the callbacks
+ * @is_freed:  Flag set if the eventfs is on its way to be freed
  * @nr_entries: The number of items in @entries
  */
 struct eventfs_inode {
@@ -38,14 +39,13 @@ struct eventfs_inode {
 * Union - used for deletion
 * @del_list:   list of eventfs_inode to delete
 * @rcu:eventfs_inode to delete in RCU
-* @is_freed:   node is freed if one of the above is set
 */
union {
struct list_headdel_list;
struct rcu_head rcu;
-   unsigned long   is_freed;
};
-   int nr_entries;
+   unsigned intis_freed:1;
+   unsigned intnr_entries:31;
 };
 
 static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
-- 
2.42.0



[PATCH v6 7/8] eventfs: Remove special processing of dput() of events directory

2023-11-01 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

The top level events directory is no longer special with regards to how it
should be delete. Remove the extra processing for it in
eventfs_set_ei_status_free().

Cc: Ajay Kaher 
Signed-off-by: Steven Rostedt (Google) 
---
Changes since v5: https://lkml.kernel.org/r/20231031223421.185413...@goodmis.org

- Resynced to this patch series

 fs/tracefs/event_inode.c | 19 ++-
 1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 0a04ae0ca8c8..0087a3f455f1 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -274,28 +274,11 @@ static void free_ei(struct eventfs_inode *ei)
  */
 void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry 
*dentry)
 {
-   struct tracefs_inode *ti_parent;
struct eventfs_inode *ei;
int i;
 
-   /* The top level events directory may be freed by this */
-   if (unlikely(ti->flags & TRACEFS_EVENT_TOP_INODE)) {
-   mutex_lock(_mutex);
-   ei = ti->private;
-   /* Nothing should access this, but just in case! */
-   ti->private = NULL;
-   mutex_unlock(_mutex);
-
-   free_ei(ei);
-   return;
-   }
-
mutex_lock(_mutex);
 
-   ti_parent = get_tracefs(dentry->d_parent->d_inode);
-   if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE))
-   goto out;
-
ei = dentry->d_fsdata;
if (!ei)
goto out;
@@ -920,6 +903,8 @@ struct eventfs_inode *eventfs_create_events_dir(const char 
*name, struct dentry
inode->i_op = _root_dir_inode_operations;
inode->i_fop = _file_operations;
 
+   dentry->d_fsdata = ei;
+
/* directory inodes start off with i_nlink == 2 (for "." entry) */
inc_nlink(inode);
d_instantiate(dentry, inode);
-- 
2.42.0



[PATCH v6 0/8] eventfs: Fixing dynamic creation

2023-11-01 Thread Steven Rostedt


Changes since v5: 
https://lore.kernel.org/all/20231031223326.794680...@goodmis.org/

This was originally based on:

  [PATCH] eventfs: Process deletion of dentry more thoroughly
  
https://lore.kernel.org/linux-trace-kernel/20231031144703.71eef...@gandalf.local.home/

But Al Viro asked why I do not use use simple_recursive_removal()?
Which at first I thought I could not, but after playing with it
more I realized I could (with a little addition of dget()'s to
balance the dput()s). This simplifies the code. But to do this, it
really needs to be done *after* these changes. Hence, I moved that
patch to after this series and to handle this rebase, I'm reposting
all the patches and including the above mentioned patch at the end
and renamed to:

  eventfs: Use simple_recursive_removal() to clean up dentries

I hope this is the last series, as this *is* for this merge window!

Steven Rostedt (Google) (8):
  eventfs: Remove "is_freed" union with rcu head
  eventfs: Have a free_ei() that just frees the eventfs_inode
  eventfs: Test for ei->is_freed when accessing ei->dentry
  eventfs: Save ownership and mode
  eventfs: Hold eventfs_mutex when calling callback functions
  eventfs: Delete eventfs_inode when the last dentry is freed
  eventfs: Remove special processing of dput() of events directory
  eventfs: Use simple_recursive_removal() to clean up dentries


 fs/tracefs/event_inode.c | 420 ---
 fs/tracefs/internal.h|  29 +++-
 include/linux/tracefs.h  |  43 +
 3 files changed, 357 insertions(+), 135 deletions(-)



stable-rc: 5.15 - all builds failed - ld.lld: error: undefined symbol: kallsyms_on_each_symbol

2023-11-01 Thread Naresh Kamboju
Hi Greg,

I see the following build warning / errors everywhere on stable-rc 5.15 branch.

ld.lld: error: undefined symbol: kallsyms_on_each_symbol
>>> referenced by trace_kprobe.c
>>>   trace/trace_kprobe.o:(create_local_trace_kprobe) in archive 
>>> kernel/built-in.a
>>> referenced by trace_kprobe.c
>>>   trace/trace_kprobe.o:(__trace_kprobe_create) in archive 
>>> kernel/built-in.a
make[1]: *** [Makefile:1227: vmlinux] Error 1

Links,
 - 
https://storage.tuxsuite.com/public/linaro/lkft/builds/2XXALLRIZaXJVcqhff4ZmGTeZoQ/

- Naresh



Re: stable-rc: 5.4 - all builds failed - ld.lld: error: undefined symbol: kallsyms_on_each_symbol

2023-11-01 Thread Naresh Kamboju
On Wed, 1 Nov 2023 at 20:38, Naresh Kamboju  wrote:
>
> Hi Greg,
>
> I see the following build warning / errors everywhere on stable-rc 5.4 branch.

I am sorry,
It is on stable-rc 5.15.

>
> ld.lld: error: undefined symbol: kallsyms_on_each_symbol
> >>> referenced by trace_kprobe.c
> >>>   trace/trace_kprobe.o:(create_local_trace_kprobe) in archive 
> >>> kernel/built-in.a
> >>> referenced by trace_kprobe.c
> >>>   trace/trace_kprobe.o:(__trace_kprobe_create) in archive 
> >>> kernel/built-in.a
> make[1]: *** [Makefile:1227: vmlinux] Error 1

I see the following latest commit on stable-rc 5.15 branch

tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols
commit b022f0c7e404887a7c5229788fc99eff9f9a80d5 upstream.

- Naresh



> Links,
>  - 
> https://storage.tuxsuite.com/public/linaro/lkft/builds/2XXALLRIZaXJVcqhff4ZmGTeZoQ/
>
> - Naresh



stable-rc: 5.4 - all builds failed - ld.lld: error: undefined symbol: kallsyms_on_each_symbol

2023-11-01 Thread Naresh Kamboju
Hi Greg,

I see the following build warning / errors everywhere on stable-rc 5.4 branch.

ld.lld: error: undefined symbol: kallsyms_on_each_symbol
>>> referenced by trace_kprobe.c
>>>   trace/trace_kprobe.o:(create_local_trace_kprobe) in archive 
>>> kernel/built-in.a
>>> referenced by trace_kprobe.c
>>>   trace/trace_kprobe.o:(__trace_kprobe_create) in archive 
>>> kernel/built-in.a
make[1]: *** [Makefile:1227: vmlinux] Error 1

Links,
 - 
https://storage.tuxsuite.com/public/linaro/lkft/builds/2XXALLRIZaXJVcqhff4ZmGTeZoQ/

- Naresh



Re: [PATCH] eventfs: Process deletion of dentry more thoroughly

2023-11-01 Thread Steven Rostedt
On Wed, 1 Nov 2023 00:16:59 -0400
Steven Rostedt  wrote:

> On Wed, 1 Nov 2023 02:25:53 +
> Al Viro  wrote:
> 
> > Umm...  Is there any reason not to use simple_recursive_removal() there?  
> 
> Hmm, I may be able to (I'm still a newbie with understanding of the vfs).
> 
> I did it this way thinking that a dentry may exist in the children but not
> at a higher level, but I don't think that can be the case. This creates
> dentries and inodes dynamically when they are referenced. The eventfs_inode
> maps to each directory (the files of a directory are created from the
> information from the eventfs_inode).

OK, as I tried to use the simple_recursive_remove() and I failed miserably!

I think I know why. What happened was the last child would get one extra
"dput" than it needed. That's because dentry's exist without any reference
on them and they don't disappear until a reclaim happens. What I mean is,
when a file is "open()'d" a dentry is created on the fly so that the user
can access the file. When it is "close()'d" the dentry count goes to zero.

Then on memory reclaim, the dentries may be removed. If another open
happens, the dentry is created again, or the one that is still cached can
be reinstated.

It looks like the simple_recursive_remove() expects all dentries to have at
least a 1 when entering, which is not the case here.

But!

Now what I could do is to do a dget() when removing the eventfs_inodes (ei)
on any dentry that is attached to them.

/me goes and tries that...

OK, that actually seems to work. With the assumption that there will never
be dentry without a parent I think I can go this approach.

Thanks!

-- Steve



Re: [PATCH v2 2/3] remoteproc: qcom: pas: make region assign more generic

2023-11-01 Thread Mukesh Ojha




On 10/31/2023 10:36 PM, Neil Armstrong wrote:

Hi,

On 30/10/2023 14:10, Mukesh Ojha wrote:



On 10/30/2023 3:33 PM, Neil Armstrong wrote:

The current memory region assign only supports a single
memory region.

But new platforms introduces more regions to make the
memory requirements more flexible for various use cases.
Those new platforms also shares the memory region between the
DSP and HLOS.

To handle this, make the region assign more generic in order
to support more than a single memory region and also permit
setting the regions permissions as shared.

Signed-off-by: Neil Armstrong 
---
  drivers/remoteproc/qcom_q6v5_pas.c | 102 
-

  1 file changed, 66 insertions(+), 36 deletions(-)

diff --git a/drivers/remoteproc/qcom_q6v5_pas.c 
b/drivers/remoteproc/qcom_q6v5_pas.c

index 913a5d2068e8..4829fd26e17d 100644
--- a/drivers/remoteproc/qcom_q6v5_pas.c
+++ b/drivers/remoteproc/qcom_q6v5_pas.c
@@ -33,6 +33,8 @@
  #define ADSP_DECRYPT_SHUTDOWN_DELAY_MS    100
+#define MAX_ASSIGN_COUNT 2
+
  struct adsp_data {
  int crash_reason_smem;
  const char *firmware_name;
@@ -51,6 +53,9 @@ struct adsp_data {
  int ssctl_id;
  int region_assign_idx;
+    int region_assign_count;
+    bool region_assign_shared;
+    int region_assign_vmid;
  };
  struct qcom_adsp {
@@ -87,15 +92,18 @@ struct qcom_adsp {
  phys_addr_t dtb_mem_phys;
  phys_addr_t mem_reloc;
  phys_addr_t dtb_mem_reloc;
-    phys_addr_t region_assign_phys;
+    phys_addr_t region_assign_phys[MAX_ASSIGN_COUNT];
  void *mem_region;
  void *dtb_mem_region;
  size_t mem_size;
  size_t dtb_mem_size;
-    size_t region_assign_size;
+    size_t region_assign_size[MAX_ASSIGN_COUNT];
  int region_assign_idx;
-    u64 region_assign_perms;
+    int region_assign_count;
+    bool region_assign_shared;
+    int region_assign_vmid;
+    u64 region_assign_perms[MAX_ASSIGN_COUNT];
  struct qcom_rproc_glink glink_subdev;
  struct qcom_rproc_subdev smd_subdev;
@@ -590,37 +598,52 @@ static int adsp_alloc_memory_region(struct 
qcom_adsp *adsp)

  static int adsp_assign_memory_region(struct qcom_adsp *adsp)
  {
-    struct reserved_mem *rmem = NULL;
-    struct qcom_scm_vmperm perm;
+    struct qcom_scm_vmperm perm[MAX_ASSIGN_COUNT];
+    unsigned int perm_size = 1;


AFAICS, not need of initialization.


Indeed, removed




  struct device_node *node;
-    int ret;
+    int offset, ret;


Nit: one variable per line.


Done




  if (!adsp->region_assign_idx)


Not related to this patch..
Should not this be valid only for > 1 ?


I don't understand, only region_assign_idx > 1 triggers a memory_assign,
and this check discards configurations with region_assign_idx == 0 as
expected.


Ah, you can ignore the comments, I got the intention after commenting
here ..







  return 0;
-    node = of_parse_phandle(adsp->dev->of_node, "memory-region", 
adsp->region_assign_idx);

-    if (node)
-    rmem = of_reserved_mem_lookup(node);
-    of_node_put(node);
-    if (!rmem) {
-    dev_err(adsp->dev, "unable to resolve shareable 
memory-region\n");

-    return -EINVAL;
-    }
+    for (offset = 0; offset < adsp->region_assign_count; ++offset) {
+    struct reserved_mem *rmem = NULL;
+
+    node = of_parse_phandle(adsp->dev->of_node, "memory-region",
+    adsp->region_assign_idx + offset);
+    if (node)
+    rmem = of_reserved_mem_lookup(node);
+    of_node_put(node);
+    if (!rmem) {
+    dev_err(adsp->dev, "unable to resolve shareable 
memory-region index %d\n",

+    offset);
+    return -EINVAL; > +    }




-    perm.vmid = QCOM_SCM_VMID_MSS_MSA;
-    perm.perm = QCOM_SCM_PERM_RW;
+    if (adsp->region_assign_shared)  {
+    perm[0].vmid = QCOM_SCM_VMID_HLOS;
+    perm[0].perm = QCOM_SCM_PERM_RW;
+    perm[1].vmid = adsp->region_assign_vmid;
+    perm[1].perm = QCOM_SCM_PERM_RW;
+    perm_size = 2;
+    } else {
+    perm[0].vmid = adsp->region_assign_vmid;
+    perm[0].perm = QCOM_SCM_PERM_RW;
+    perm_size = 1;
+    }
-    adsp->region_assign_phys = rmem->base;
-    adsp->region_assign_size = rmem->size;
-    adsp->region_assign_perms = BIT(QCOM_SCM_VMID_HLOS);
+    adsp->region_assign_phys[offset] = rmem->base;
+    adsp->region_assign_size[offset] = rmem->size;
+    adsp->region_assign_perms[offset] = BIT(QCOM_SCM_VMID_HLOS);


Do we need array for this, is this changing ?


We need to keep region_assign_perms for unassign, but for the other 2 we 
would

need to duplicate the code from adsp_assign_memory_region into
adsp_unassign_memory_region.


Thanks got it.






-    ret = qcom_scm_assign_mem(adsp->region_assign_phys,
-  adsp->region_assign_size,
-  >region_assign_perms,
-  , 1);
-    if (ret < 0) {
-    dev_err(adsp->dev, "assign 

Re: [PATCH v5 2/7] eventfs: Have a free_ei() that just frees the eventfs_inode

2023-11-01 Thread Google
On Tue, 31 Oct 2023 18:33:28 -0400
Steven Rostedt  wrote:

> From: "Steven Rostedt (Google)" 
> 
> As the eventfs_inode is freed in two different locations, make a helper
> function free_ei() to make sure all the allocated fields of the
> eventfs_inode is freed.
> 
> This requires renaming the existing free_ei() which is called by the srcu
> handler to free_rcu_ei() and have free_ei() just do the freeing, where
> free_rcu_ei() will call it.

Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) 

Thanks,

> 
> Signed-off-by: Steven Rostedt (Google) 
> ---
> Changse since v4: 
> https://lore.kernel.org/all/20231031193428.133533...@goodmis.org/T/#u
> 
> - Rebased to this patch series
> 
>  fs/tracefs/event_inode.c | 19 +++
>  1 file changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 2c2c75b2ad73..0331d9bd568b 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -129,6 +129,13 @@ static struct dentry *create_dir(const char *name, 
> struct dentry *parent)
>   return eventfs_end_creating(dentry);
>  }
>  
> +static void free_ei(struct eventfs_inode *ei)
> +{
> + kfree_const(ei->name);
> + kfree(ei->d_children);
> + kfree(ei);
> +}
> +
>  /**
>   * eventfs_set_ei_status_free - remove the dentry reference from an 
> eventfs_inode
>   * @ti: the tracefs_inode of the dentry
> @@ -168,9 +175,7 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, 
> struct dentry *dentry)
>   eventfs_remove_dir(ei_child);
>   }
>  
> - kfree_const(ei->name);
> - kfree(ei->d_children);
> - kfree(ei);
> + free_ei(ei);
>   return;
>   }
>  
> @@ -784,13 +789,11 @@ struct eventfs_inode *eventfs_create_events_dir(const 
> char *name, struct dentry
>   return ERR_PTR(-ENOMEM);
>  }
>  
> -static void free_ei(struct rcu_head *head)
> +static void free_rcu_ei(struct rcu_head *head)
>  {
>   struct eventfs_inode *ei = container_of(head, struct eventfs_inode, 
> rcu);
>  
> - kfree_const(ei->name);
> - kfree(ei->d_children);
> - kfree(ei);
> + free_ei(ei);
>  }
>  
>  /**
> @@ -883,7 +886,7 @@ void eventfs_remove_dir(struct eventfs_inode *ei)
>   for (i = 0; i < ei->nr_entries; i++)
>   unhook_dentry(>d_children[i], _list);
>   unhook_dentry(>dentry, _list);
> - call_srcu(_srcu, >rcu, free_ei);
> + call_srcu(_srcu, >rcu, free_rcu_ei);
>   }
>   mutex_unlock(_mutex);
>  
> -- 
> 2.42.0


-- 
Masami Hiramatsu (Google) 



Re: [PATCH v5 1/7] eventfs: Remove "is_freed" union with rcu head

2023-11-01 Thread Google
On Tue, 31 Oct 2023 18:33:27 -0400
Steven Rostedt  wrote:

> From: "Steven Rostedt (Google)" 
> 
> The eventfs_inode->is_freed was a union with the rcu_head with the
> assumption that when it was on the srcu list the head would contain a
> pointer which would make "is_freed" true. But that was a wrong assumption
> as the rcu head is a single link list where the last element is NULL.
> 
> Instead, split the nr_entries integer so that "is_freed" is one bit and
> the nr_entries is the next 31 bits. As there shouldn't be more than 10
> (currently there's at most 5 to 7 depending on the config), this should
> not be a problem.

Yeah, even 16 bit nr_entries is enough.. (maybe the biggest user is
syscall event group)

Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) 

Thank you,


> Cc: sta...@vger.kernel.org
> Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open 
> functions")
> Signed-off-by: Steven Rostedt (Google) 
> ---
>  fs/tracefs/event_inode.c | 2 ++
>  fs/tracefs/internal.h| 6 +++---
>  2 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 754885dfe71c..2c2c75b2ad73 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -824,6 +824,8 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, 
> struct list_head *head,
>   eventfs_remove_rec(ei_child, head, level + 1);
>   }
>  
> + ei->is_freed = 1;
> +
>   list_del_rcu(>list);
>   list_add_tail(>del_list, head);
>  }
> diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
> index 64fde9490f52..c7d88aaa949f 100644
> --- a/fs/tracefs/internal.h
> +++ b/fs/tracefs/internal.h
> @@ -23,6 +23,7 @@ struct tracefs_inode {
>   * @d_parent:   pointer to the parent's dentry
>   * @d_children: The array of dentries to represent the files when created
>   * @data:The private data to pass to the callbacks
> + * @is_freed:Flag set if the eventfs is on its way to be freed
>   * @nr_entries: The number of items in @entries
>   */
>  struct eventfs_inode {
> @@ -38,14 +39,13 @@ struct eventfs_inode {
>* Union - used for deletion
>* @del_list:   list of eventfs_inode to delete
>* @rcu:eventfs_inode to delete in RCU
> -  * @is_freed:   node is freed if one of the above is set
>*/
>   union {
>   struct list_headdel_list;
>   struct rcu_head rcu;
> - unsigned long   is_freed;
>   };
> - int nr_entries;
> + unsigned intis_freed:1;
> + unsigned intnr_entries:31;
>  };
>  
>  static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
> -- 
> 2.42.0


-- 
Masami Hiramatsu (Google) 



Re: [PATCH v3] seq_buf: Introduce DECLARE_SEQ_BUF and seq_buf_str()

2023-11-01 Thread Christophe JAILLET

Le 27/10/2023 à 17:56, Kees Cook a écrit :

Solve two ergonomic issues with struct seq_buf;

1) Too much boilerplate is required to initialize:

struct seq_buf s;
char buf[32];

seq_buf_init(s, buf, sizeof(buf));

Instead, we can build this directly on the stack. Provide
DECLARE_SEQ_BUF() macro to do this:

DECLARE_SEQ_BUF(s, 32);

2) %NUL termination is fragile and requires 2 steps to get a valid
C String (and is a layering violation exposing the "internals" of
seq_buf):

seq_buf_terminate(s);
do_something(s->buffer);

Instead, we can just return s->buffer directly after terminating it in
the refactored seq_buf_terminate(), now known as seq_buf_str():

do_something(seq_buf_str(s));

Cc: Steven Rostedt 
Cc: "Matthew Wilcox (Oracle)" 
Cc: Christoph Hellwig 
Cc: Justin Stitt 
Cc: Kent Overstreet 
Cc: Petr Mladek 
Cc: Andy Shevchenko 
Cc: Rasmus Villemoes 
Cc: Sergey Senozhatsky 
Cc: Masami Hiramatsu 
Cc: Greg Kroah-Hartman 
Cc: Arnd Bergmann 
Cc: Jonathan Corbet 
Cc: Yun Zhou 
Cc: Jacob Keller 
Cc: Zhen Lei 
Cc: linux-trace-ker...@vger.kernel.org
Link: https://lore.kernel.org/r/20231026194033.it.702-k...@kernel.org
Signed-off-by: Kees Cook 
---
v3
  - fix commit log typos
  - improve code style for DECLARE_SEQ_BUF (shevchenko)
  - const-ify seq_bug_str() return (rostedt)
v2 - https://lore.kernel.org/lkml/20231026194033.it.702-k...@kernel.org
v1 - https://lore.kernel.org/lkml/20231026170722.work.638-k...@kernel.org
---
  include/linux/seq_buf.h | 21 +
  kernel/trace/trace.c| 11 +--
  lib/seq_buf.c   |  4 +---
  3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/include/linux/seq_buf.h b/include/linux/seq_buf.h
index 8483e4b2d0d2..5fb1f12c33f9 100644
--- a/include/linux/seq_buf.h
+++ b/include/linux/seq_buf.h
@@ -21,9 +21,18 @@ struct seq_buf {
size_t  len;
  };
  
+#define DECLARE_SEQ_BUF(NAME, SIZE)			\

+   char __ ## NAME ## _buffer[SIZE] = "";\
+   struct seq_buf NAME = { \
+   .buffer = &__ ## NAME ## _buffer,   \

 ~~~
Is the & needed here?

CJ


+   .size = SIZE,   \
+   }
+
  static inline void seq_buf_clear(struct seq_buf *s)
  {
s->len = 0;
+   if (s->size)
+   s->buffer[0] = '\0';
  }
  
  static inline void

@@ -69,8 +78,8 @@ static inline unsigned int seq_buf_used(struct seq_buf *s)
  }
  
  /**

- * seq_buf_terminate - Make sure buffer is nul terminated
- * @s: the seq_buf descriptor to terminate.
+ * seq_buf_str - get %NUL-terminated C string from seq_buf
+ * @s: the seq_buf handle
   *
   * This makes sure that the buffer in @s is nul terminated and
   * safe to read as a string.
@@ -81,16 +90,20 @@ static inline unsigned int seq_buf_used(struct seq_buf *s)
   *
   * After this function is called, s->buffer is safe to use
   * in string operations.
+ *
+ * Returns @s->buf after making sure it is terminated.
   */
-static inline void seq_buf_terminate(struct seq_buf *s)
+static inline const char *seq_buf_str(struct seq_buf *s)
  {
if (WARN_ON(s->size == 0))
-   return;
+   return "";
  
  	if (seq_buf_buffer_left(s))

s->buffer[s->len] = 0;
else
s->buffer[s->size - 1] = 0;
+
+   return s->buffer;
  }
  
  /**

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d629065c2383..2539cfc20a97 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3828,15 +3828,6 @@ static bool trace_safe_str(struct trace_iterator *iter, 
const char *str,
return false;
  }
  
-static const char *show_buffer(struct trace_seq *s)

-{
-   struct seq_buf *seq = >seq;
-
-   seq_buf_terminate(seq);
-
-   return seq->buffer;
-}
-
  static DEFINE_STATIC_KEY_FALSE(trace_no_verify);
  
  static int test_can_verify_check(const char *fmt, ...)

@@ -3976,7 +3967,7 @@ void trace_check_vprintf(struct trace_iterator *iter, 
const char *fmt,
 */
if (WARN_ONCE(!trace_safe_str(iter, str, star, len),
  "fmt: '%s' current_buffer: '%s'",
- fmt, show_buffer(>seq))) {
+ fmt, seq_buf_str(>seq.seq))) {
int ret;
  
  			/* Try to safely read the string */

diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index b7477aefff53..23518f77ea9c 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -109,9 +109,7 @@ void seq_buf_do_printk(struct seq_buf *s, const char *lvl)
if (s->size == 0 || s->len == 0)
return;
  
-	seq_buf_terminate(s);

-
-   start = s->buffer;
+   start = seq_buf_str(s);
while ((lf = strchr(start, '\n'))) {
int len = lf - start + 1;