On 4/13/26 2:33 PM, Cao Ruichuang wrote:
> The previous module tracepoint_string() fix took the smallest
> implementation path and reused the existing module trace_printk format
> storage.
> 
> That was enough to make module __tracepoint_str entries show up in
> printk_formats and be accepted by trace_is_tracepoint_string(), but it
> also made those copied mappings persist after module unload. That does
> not match the expected module lifetime semantics.
> 
> Handle module tracepoint_string() mappings separately instead of mixing
> them into the module trace_printk format list. Keep copying the strings
> into tracing-managed storage while the module is loaded, but track them
> on their own list and drop them again on MODULE_STATE_GOING.
> 
> Keep module trace_printk format handling unchanged.
> 
> This split is intentional: module trace_printk formats and module
> tracepoint_string() mappings do not have the same lifetime requirements.
> Keeping them in one shared structure would either preserve
> tracepoint_string() mappings too long again, or require mixed
> ownership/refcount rules in a trace_printk-oriented structure.
> 
> The separate module tracepoint_string() list intentionally keeps one
> copied mapping per module entry instead of trying to share copies across
> modules by string contents. printk_formats is address-based, and sharing
> those copies would add another layer of shared ownership/refcounting
> without changing the lifetime rule this fix is trying to restore.

I believe it would be more productive to first answer the question in my
previous email and give others time to comment on the noted issue and
discuss a possible solution. The previous patch hasn't been accepted
yet, so sending a new patch on top of it just hours after my comment
seems unusual.

The patch contains some unrelated changes, such as renaming last_index
to next_index in find_next(). I suspect it was generated with the help
of AI but it isn't attributed as such. Please check
Documentation/process/coding-assistants.rst.

The implementation splits the trace_bprintk_fmt_list into two lists.
While this looks sensible, it creates nearly duplicate code for handling
the two lists, for example, hold_module_trace_format() and
hold_module_trace_format(), or lookup_mod_format_ptr() and
lookup_mod_tracepoint_ptr(). An alternative would be to keep one list
and have a 'struct module *mod' member to differentiate between the
strings. 'mod == NULL' for printk fmts and 'mod != NULL' for tracepoint
strings.

I also considered whether kernel/trace/trace_printk.c could avoid
tracking tracepoint strings altogether and instead have
find_next_mod_entry() directly traverse the modules list and
module::tracepoint_strings_start. However, this would require taking
both btrace_mutex and the RCU lock, and I believe it would be more
complex.

Since the tracepoint strings are now tracked only for the duration of
a module's lifetime, kernel/trace/trace_printk.c doesn't need to copy
the strings and can reference directly the original module data.
Alternatively, the original strings could be placed in an .init section
so that once trace_printk.c copies them and the module is initialized,
they get dropped.

-- 
Thanks,
Petr


> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217196
> Signed-off-by: Cao Ruichuang <[email protected]>
> ---
>  include/linux/tracepoint.h  |   9 +-
>  kernel/trace/trace_printk.c | 250 ++++++++++++++++++++++--------------
>  2 files changed, 157 insertions(+), 102 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index f14da542402..aec598a4017 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -479,11 +479,10 @@ static inline struct tracepoint 
> *tracepoint_ptr_deref(tracepoint_ptr_t *p)
>   *
>   * For built-in code, the tracing system uses the original string address.
>   * For modules, the tracing code saves tracepoint strings into
> - * tracing-managed storage when the module loads, so their mappings remain
> - * available through printk_formats and trace string checks even after the
> - * module's own memory goes away. As long as the string does not change
> - * during the life of the module, it is fine to use tracepoint_string()
> - * within a module.
> + * tracing-managed storage while the module is loaded, and drops those
> + * mappings again when the module unloads. As long as the string does not
> + * change during the life of the module, it is fine to use
> + * tracepoint_string() within a module.
>   */
>  #define tracepoint_string(str)                                               
> \
>       ({                                                              \
> diff --git a/kernel/trace/trace_printk.c b/kernel/trace/trace_printk.c
> index 9f67ce42ef6..0420ffcff93 100644
> --- a/kernel/trace/trace_printk.c
> +++ b/kernel/trace/trace_printk.c
> @@ -21,24 +21,24 @@
>  
>  #ifdef CONFIG_MODULES
>  
> -/*
> - * modules trace_printk() formats and tracepoint_string() strings are
> - * autosaved in struct trace_bprintk_fmt, which are queued on
> - * trace_bprintk_fmt_list.
> - */
> +/* module trace_printk() formats are autosaved on trace_bprintk_fmt_list. */
>  static LIST_HEAD(trace_bprintk_fmt_list);
> +/* module tracepoint_string() copies live on tracepoint_string_list. */
> +static LIST_HEAD(tracepoint_string_list);
>  
> -/* serialize accesses to trace_bprintk_fmt_list */
> +/* serialize accesses to module format and tracepoint-string lists */
>  static DEFINE_MUTEX(btrace_mutex);
>  
>  struct trace_bprintk_fmt {
>       struct list_head list;
>       const char *fmt;
> -     unsigned int type;
>  };
>  
> -#define TRACE_BPRINTK_TYPE           BIT(0)
> -#define TRACE_TRACEPOINT_TYPE                BIT(1)
> +struct tracepoint_string_entry {
> +     struct list_head list;
> +     struct module *mod;
> +     const char *str;
> +};
>  
>  static inline struct trace_bprintk_fmt *lookup_format(const char *fmt)
>  {
> @@ -54,24 +54,21 @@ static inline struct trace_bprintk_fmt 
> *lookup_format(const char *fmt)
>       return NULL;
>  }
>  
> -static void hold_module_trace_format(const char **start, const char **end,
> -                                  unsigned int type)
> +static void hold_module_trace_bprintk_format(const char **start, const char 
> **end)
>  {
>       const char **iter;
>       char *fmt;
>  
>       /* allocate the trace_printk per cpu buffers */
> -     if ((type & TRACE_BPRINTK_TYPE) && start != end)
> +     if (start != end)
>               trace_printk_init_buffers();
>  
>       mutex_lock(&btrace_mutex);
>       for (iter = start; iter < end; iter++) {
>               struct trace_bprintk_fmt *tb_fmt = lookup_format(*iter);
>               if (tb_fmt) {
> -                     if (!IS_ERR(tb_fmt)) {
> -                             tb_fmt->type |= type;
> +                     if (!IS_ERR(tb_fmt))
>                               *iter = tb_fmt->fmt;
> -                     }
>                       continue;
>               }
>  
> @@ -83,7 +80,6 @@ static void hold_module_trace_format(const char **start, 
> const char **end,
>                               list_add_tail(&tb_fmt->list, 
> &trace_bprintk_fmt_list);
>                               strcpy(fmt, *iter);
>                               tb_fmt->fmt = fmt;
> -                             tb_fmt->type = type;
>                       } else
>                               kfree(tb_fmt);
>               }
> @@ -93,89 +89,156 @@ static void hold_module_trace_format(const char **start, 
> const char **end,
>       mutex_unlock(&btrace_mutex);
>  }
>  
> +static void hold_module_tracepoint_strings(struct module *mod,
> +                                        const char **start,
> +                                        const char **end)
> +{
> +     const char **iter;
> +
> +     mutex_lock(&btrace_mutex);
> +     for (iter = start; iter < end; iter++) {
> +             struct tracepoint_string_entry *tp_entry;
> +             char *str;
> +
> +             tp_entry = kmalloc_obj(*tp_entry);
> +             if (!tp_entry)
> +                     continue;
> +
> +             str = kstrdup(*iter, GFP_KERNEL);
> +             if (!str) {
> +                     kfree(tp_entry);
> +                     continue;
> +             }
> +
> +             tp_entry->mod = mod;
> +             tp_entry->str = str;
> +             list_add_tail(&tp_entry->list, &tracepoint_string_list);
> +             *iter = tp_entry->str;
> +     }
> +     mutex_unlock(&btrace_mutex);
> +}
> +
> +static void release_module_tracepoint_strings(struct module *mod)
> +{
> +     struct tracepoint_string_entry *tp_entry, *n;
> +
> +     mutex_lock(&btrace_mutex);
> +     list_for_each_entry_safe(tp_entry, n, &tracepoint_string_list, list) {
> +             if (tp_entry->mod != mod)
> +                     continue;
> +             list_del(&tp_entry->list);
> +             kfree(tp_entry->str);
> +             kfree(tp_entry);
> +     }
> +     mutex_unlock(&btrace_mutex);
> +}
> +
>  static int module_trace_format_notify(struct notifier_block *self,
>                                     unsigned long val, void *data)
>  {
>       struct module *mod = data;
>  
> -     if (val != MODULE_STATE_COMING)
> -             return NOTIFY_OK;
> +     switch (val) {
> +     case MODULE_STATE_COMING:
> +             if (mod->num_trace_bprintk_fmt) {
> +                     const char **start = mod->trace_bprintk_fmt_start;
> +                     const char **end = start + mod->num_trace_bprintk_fmt;
>  
> -     if (mod->num_trace_bprintk_fmt) {
> -             const char **start = mod->trace_bprintk_fmt_start;
> -             const char **end = start + mod->num_trace_bprintk_fmt;
> +                     hold_module_trace_bprintk_format(start, end);
> +             }
> +
> +             if (mod->num_tracepoint_strings) {
> +                     const char **start = mod->tracepoint_strings_start;
> +                     const char **end = start + mod->num_tracepoint_strings;
>  
> -             hold_module_trace_format(start, end, TRACE_BPRINTK_TYPE);
> +                     hold_module_tracepoint_strings(mod, start, end);
> +             }
> +             break;
> +     case MODULE_STATE_GOING:
> +             release_module_tracepoint_strings(mod);
> +             break;
>       }
>  
> -     if (mod->num_tracepoint_strings) {
> -             const char **start = mod->tracepoint_strings_start;
> -             const char **end = start + mod->num_tracepoint_strings;
> +     return NOTIFY_OK;
> +}
>  
> -             hold_module_trace_format(start, end, TRACE_TRACEPOINT_TYPE);
> +static const char **find_first_mod_entry(void)
> +{
> +     struct trace_bprintk_fmt *tb_fmt;
> +     struct tracepoint_string_entry *tp_entry;
> +
> +     if (!list_empty(&trace_bprintk_fmt_list)) {
> +             tb_fmt = list_first_entry(&trace_bprintk_fmt_list,
> +                                       typeof(*tb_fmt), list);
> +             return &tb_fmt->fmt;
>       }
>  
> -     return NOTIFY_OK;
> +     if (!list_empty(&tracepoint_string_list)) {
> +             tp_entry = list_first_entry(&tracepoint_string_list,
> +                                         typeof(*tp_entry), list);
> +             return &tp_entry->str;
> +     }
> +
> +     return NULL;
>  }
>  
> -/*
> - * The debugfs/tracing/printk_formats file maps the addresses with
> - * the ASCII formats that are used in the bprintk events in the
> - * buffer. For userspace tools to be able to decode the events from
> - * the buffer, they need to be able to map the address with the format.
> - *
> - * The addresses of the bprintk formats are in their own section
> - * __trace_printk_fmt. But for modules we copy them into a link list.
> - * The code to print the formats and their addresses passes around the
> - * address of the fmt string. If the fmt address passed into the seq
> - * functions is within the kernel core __trace_printk_fmt section, then
> - * it simply uses the next pointer in the list.
> - *
> - * When the fmt pointer is outside the kernel core __trace_printk_fmt
> - * section, then we need to read the link list pointers. The trick is
> - * we pass the address of the string to the seq function just like
> - * we do for the kernel core formats. To get back the structure that
> - * holds the format, we simply use container_of() and then go to the
> - * next format in the list.
> - */
> -static const char **
> -find_next_mod_format(int start_index, void *v, const char **fmt, loff_t *pos)
> +static struct trace_bprintk_fmt *lookup_mod_format_ptr(const char **fmt_ptr)
>  {
> -     struct trace_bprintk_fmt *mod_fmt;
> +     struct trace_bprintk_fmt *tb_fmt;
>  
> -     if (list_empty(&trace_bprintk_fmt_list))
> -             return NULL;
> +     list_for_each_entry(tb_fmt, &trace_bprintk_fmt_list, list) {
> +             if (fmt_ptr == &tb_fmt->fmt)
> +                     return tb_fmt;
> +     }
>  
> -     /*
> -      * v will point to the address of the fmt record from t_next
> -      * v will be NULL from t_start.
> -      * If this is the first pointer or called from start
> -      * then we need to walk the list.
> -      */
> -     if (!v || start_index == *pos) {
> -             struct trace_bprintk_fmt *p;
> -
> -             /* search the module list */
> -             list_for_each_entry(p, &trace_bprintk_fmt_list, list) {
> -                     if (start_index == *pos)
> -                             return &p->fmt;
> -                     start_index++;
> +     return NULL;
> +}
> +
> +static struct tracepoint_string_entry *lookup_mod_tracepoint_ptr(const char 
> **str_ptr)
> +{
> +     struct tracepoint_string_entry *tp_entry;
> +
> +     list_for_each_entry(tp_entry, &tracepoint_string_list, list) {
> +             if (str_ptr == &tp_entry->str)
> +                     return tp_entry;
> +     }
> +
> +     return NULL;
> +}
> +
> +static const char **find_next_mod_entry(int start_index, void *v, loff_t 
> *pos)
> +{
> +     struct trace_bprintk_fmt *tb_fmt;
> +     struct tracepoint_string_entry *tp_entry;
> +
> +     if (!v || start_index == *pos)
> +             return find_first_mod_entry();
> +
> +     tb_fmt = lookup_mod_format_ptr(v);
> +     if (tb_fmt) {
> +             if (tb_fmt->list.next != &trace_bprintk_fmt_list) {
> +                     tb_fmt = list_next_entry(tb_fmt, list);
> +                     return &tb_fmt->fmt;
>               }
> -             /* pos > index */
> +
> +             if (!list_empty(&tracepoint_string_list)) {
> +                     tp_entry = list_first_entry(&tracepoint_string_list,
> +                                                 typeof(*tp_entry), list);
> +                     return &tp_entry->str;
> +             }
> +
>               return NULL;
>       }
>  
> -     /*
> -      * v points to the address of the fmt field in the mod list
> -      * structure that holds the module print format.
> -      */
> -     mod_fmt = container_of(v, typeof(*mod_fmt), fmt);
> -     if (mod_fmt->list.next == &trace_bprintk_fmt_list)
> +     tp_entry = lookup_mod_tracepoint_ptr(v);
> +     if (!tp_entry)
>               return NULL;
>  
> -     mod_fmt = container_of(mod_fmt->list.next, typeof(*mod_fmt), list);
> +     if (tp_entry->list.next == &tracepoint_string_list)
> +             return NULL;
>  
> -     return &mod_fmt->fmt;
> +     tp_entry = list_next_entry(tp_entry, list);
> +     return &tp_entry->str;
>  }
>  
>  static void format_mod_start(void)
> @@ -195,8 +258,8 @@ module_trace_format_notify(struct notifier_block *self,
>  {
>       return NOTIFY_OK;
>  }
> -static inline const char **
> -find_next_mod_format(int start_index, void *v, const char **fmt, loff_t *pos)
> +static inline const char **find_next_mod_entry(int start_index, void *v,
> +                                            loff_t *pos)
>  {
>       return NULL;
>  }
> @@ -274,7 +337,7 @@ bool trace_is_tracepoint_string(const char *str)
>  {
>       const char **ptr = __start___tracepoint_str;
>  #ifdef CONFIG_MODULES
> -     struct trace_bprintk_fmt *tb_fmt;
> +     struct tracepoint_string_entry *tp_entry;
>  #endif
>  
>       for (ptr = __start___tracepoint_str; ptr < __stop___tracepoint_str; 
> ptr++) {
> @@ -284,8 +347,8 @@ bool trace_is_tracepoint_string(const char *str)
>  
>  #ifdef CONFIG_MODULES
>       mutex_lock(&btrace_mutex);
> -     list_for_each_entry(tb_fmt, &trace_bprintk_fmt_list, list) {
> -             if ((tb_fmt->type & TRACE_TRACEPOINT_TYPE) && str == 
> tb_fmt->fmt) {
> +     list_for_each_entry(tp_entry, &tracepoint_string_list, list) {
> +             if (str == tp_entry->str) {
>                       mutex_unlock(&btrace_mutex);
>                       return true;
>               }
> @@ -297,9 +360,8 @@ bool trace_is_tracepoint_string(const char *str)
>  
>  static const char **find_next(void *v, loff_t *pos)
>  {
> -     const char **fmt = v;
>       int start_index;
> -     int last_index;
> +     int next_index;
>  
>       start_index = __stop___trace_bprintk_fmt - __start___trace_bprintk_fmt;
>  
> @@ -307,25 +369,19 @@ static const char **find_next(void *v, loff_t *pos)
>               return __start___trace_bprintk_fmt + *pos;
>  
>       /*
> -      * The __tracepoint_str section is treated the same as the
> -      * __trace_printk_fmt section. The difference is that the
> -      * __trace_printk_fmt section should only be used by trace_printk()
> -      * in a debugging environment, as if anything exists in that section
> -      * the trace_prink() helper buffers are allocated, which would just
> -      * waste space in a production environment.
> -      *
> -      * The __tracepoint_str sections on the other hand are used by
> -      * tracepoints which need to map pointers to their strings to
> -      * the ASCII text for userspace.
> +      * Built-in __tracepoint_str entries are exported directly from the
> +      * core section. Module tracepoint_string() mappings are kept on a
> +      * separate tracing-managed list below, because their lifetime is tied
> +      * to module load/unload and differs from module trace_printk() formats.
>        */
> -     last_index = start_index;
> +     next_index = start_index;
>       start_index = __stop___tracepoint_str - __start___tracepoint_str;
>  
> -     if (*pos < last_index + start_index)
> -             return __start___tracepoint_str + (*pos - last_index);
> +     if (*pos < next_index + start_index)
> +             return __start___tracepoint_str + (*pos - next_index);
>  
> -     start_index += last_index;
> -     return find_next_mod_format(start_index, v, fmt, pos);
> +     start_index += next_index;
> +     return find_next_mod_entry(start_index, v, pos);
>  }
>  
>  static void *


Reply via email to