Re: [PATCH 4/5] ftrace: Convert "filter_hash" and "inc" to bool in ftrace_hash_rec_update_modify()

2024-06-04 Thread kernel test robot
Hi Steven,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.10-rc2 next-20240604]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Steven-Rostedt/ftrace-Rename-dup_hash-and-comment-it/20240605-053138
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git 
mm-everything
patch link:https://lore.kernel.org/r/20240604212855.046127611%40goodmis.org
patch subject: [PATCH 4/5] ftrace: Convert "filter_hash" and "inc" to bool in 
ftrace_hash_rec_update_modify()
config: i386-buildonly-randconfig-004-20240605 
(https://download.01.org/0day-ci/archive/20240605/202406051211.ta5ooyjm-...@intel.com/config)
compiler: gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240605/202406051211.ta5ooyjm-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202406051211.ta5ooyjm-...@intel.com/

All error/warnings (new ones prefixed by >>):

>> kernel/trace/ftrace.c:1961:13: error: conflicting types for 
>> 'ftrace_hash_rec_disable_modify'; have 'void(struct ftrace_ops *, bool)' 
>> {aka 'void(struct ftrace_ops *, _Bool)'}
1961 | static void ftrace_hash_rec_disable_modify(struct ftrace_ops *ops,
 | ^~
   kernel/trace/ftrace.c:1384:1: note: previous declaration of 
'ftrace_hash_rec_disable_modify' with type 'void(struct ftrace_ops *, int)'
1384 | ftrace_hash_rec_disable_modify(struct ftrace_ops *ops, int 
filter_hash);
 | ^~
>> kernel/trace/ftrace.c:1967:13: error: conflicting types for 
>> 'ftrace_hash_rec_enable_modify'; have 'void(struct ftrace_ops *, bool)' {aka 
>> 'void(struct ftrace_ops *, _Bool)'}
1967 | static void ftrace_hash_rec_enable_modify(struct ftrace_ops *ops,
 | ^
   kernel/trace/ftrace.c:1386:1: note: previous declaration of 
'ftrace_hash_rec_enable_modify' with type 'void(struct ftrace_ops *, int)'
1386 | ftrace_hash_rec_enable_modify(struct ftrace_ops *ops, int 
filter_hash);
 | ^
>> kernel/trace/ftrace.c:1384:1: warning: 'ftrace_hash_rec_disable_modify' used 
>> but never defined
1384 | ftrace_hash_rec_disable_modify(struct ftrace_ops *ops, int 
filter_hash);
 | ^~
>> kernel/trace/ftrace.c:1386:1: warning: 'ftrace_hash_rec_enable_modify' used 
>> but never defined
1386 | ftrace_hash_rec_enable_modify(struct ftrace_ops *ops, int 
filter_hash);
 | ^
>> kernel/trace/ftrace.c:1967:13: warning: 'ftrace_hash_rec_enable_modify' 
>> defined but not used [-Wunused-function]
1967 | static void ftrace_hash_rec_enable_modify(struct ftrace_ops *ops,
 | ^
>> kernel/trace/ftrace.c:1961:13: warning: 'ftrace_hash_rec_disable_modify' 
>> defined but not used [-Wunused-function]
1961 | static void ftrace_hash_rec_disable_modify(struct ftrace_ops *ops,
 | ^~


vim +1961 kernel/trace/ftrace.c

84261912ebee41 Steven Rostedt (Red Hat  2014-08-18  1960) 
84261912ebee41 Steven Rostedt (Red Hat  2014-08-18 @1961) static void 
ftrace_hash_rec_disable_modify(struct ftrace_ops *ops,
5177364f840058 Steven Rostedt (Google   2024-06-04  1962)   
   bool filter_hash)
84261912ebee41 Steven Rostedt (Red Hat  2014-08-18  1963) {
5177364f840058 Steven Rostedt (Google   2024-06-04  1964)   
ftrace_hash_rec_update_modify(ops, filter_hash, false);
84261912ebee41 Steven Rostedt (Red Hat  2014-08-18  1965) }
84261912ebee41 Steven Rostedt (Red Hat  2014-08-18  1966) 
84261912ebee41 Steven Rostedt (Red Hat  2014-08-18 @1967) static void 
ftrace_hash_rec_enable_modify(struct ftrace_ops *ops,
5177364f840058 Steven Rostedt (Google   2024-06-04  1968)   
  bool filter_hash)
84261912ebee41 Steven Rostedt (Red Hat  2014-08-18  1969) {
5177364f840058 Steven Rostedt (Google   2024-06-04  1970)   
ftrace_hash_rec_update_modify(ops, filter_hash, true);
84261912ebee41 Steven Rostedt (Red Hat  2014-08-18  1971) }
84261912ebee41 Steven Rostedt (Red Hat  2014-08-18  1972) 

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



Re: [PATCH] livepatch: introduce klp_func called interface

2024-06-04 Thread Song Liu
On Tue, May 21, 2024 at 1:04 AM Petr Mladek  wrote:
[...]
> >
> > Yes, but the information you get is limited compared to what is available
> > now. You would obtain the information that a patched function was called
> > but ftrace could also give you the context and more.
>
> Another motivation to use ftrace for testing is that it does not
> affect the performance in production.
>
> We should keep klp_ftrace_handler() as fast as possible so that we
> could livepatch also performance sensitive functions.

At LPC last year, we discussed about adding a counter to each
klp_func, like:

struct klp_func {
...
u64 __percpu counter;
...
};

With some static_key (+ sysctl), this should give us a way to estimate
the overhead of livepatch. If we have the counter, this patch is not
needed any more. Does this (adding the counter) sound like
something we still want to pursue?

Thanks,
Song



[PATCH] kallsyms, livepatch: Fix livepatch with CONFIG_LTO_CLANG

2024-06-04 Thread Song Liu
With CONFIG_LTO_CLANG, the compiler may postfix symbols with .llvm.
to avoid symbol duplication. scripts/kallsyms.c sorted the symbols
without these postfixes. The default symbol lookup also removes these
postfixes before comparing symbols.

On the other hand, livepatch need to look up symbols with the full names.
However, calling kallsyms_on_each_match_symbol with full name (with the
postfix) cannot find the symbol(s). As a result, we cannot livepatch
kernel functions with .llvm. postfix or kernel functions that use
relocation information to symbols with .llvm. postfixes.

Fix this by calling kallsyms_on_each_match_symbol without the postfix;
and then match the full name (with postfix) in klp_match_callback.

Signed-off-by: Song Liu 
---
 include/linux/kallsyms.h | 13 +
 kernel/kallsyms.c| 21 -
 kernel/livepatch/core.c  | 32 +++-
 3 files changed, 60 insertions(+), 6 deletions(-)

diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index c3f075e8f60c..d7d07a134ae4 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -97,6 +97,10 @@ extern int sprint_backtrace_build_id(char *buffer, unsigned 
long address);
 
 int lookup_symbol_name(unsigned long addr, char *symname);
 
+int kallsyms_lookup_symbol_full_name(unsigned long addr, char *symname);
+
+void kallsyms_cleanup_symbol_name(char *s);
+
 #else /* !CONFIG_KALLSYMS */
 
 static inline unsigned long kallsyms_lookup_name(const char *name)
@@ -165,6 +169,15 @@ static inline int kallsyms_on_each_match_symbol(int 
(*fn)(void *, unsigned long)
 {
return -EOPNOTSUPP;
 }
+
+static inline int kallsyms_lookup_symbol_full_name(unsigned long addr, char 
*symname)
+{
+   return -ERANGE;
+}
+
+static inline void kallsyms_cleanup_symbol_name(char *s)
+{
+}
 #endif /*CONFIG_KALLSYMS*/
 
 static inline void print_ip_sym(const char *loglvl, unsigned long ip)
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 22ea19a36e6e..f0d04fa1bbb4 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -163,7 +163,7 @@ unsigned long kallsyms_sym_address(int idx)
return kallsyms_relative_base - 1 - kallsyms_offsets[idx];
 }
 
-static void cleanup_symbol_name(char *s)
+void kallsyms_cleanup_symbol_name(char *s)
 {
char *res;
 
@@ -191,7 +191,7 @@ static int compare_symbol_name(const char *name, char 
*namebuf)
 * To ensure correct bisection in kallsyms_lookup_names(), do
 * cleanup_symbol_name(namebuf) before comparing name and namebuf.
 */
-   cleanup_symbol_name(namebuf);
+   kallsyms_cleanup_symbol_name(namebuf);
return strcmp(name, namebuf);
 }
 
@@ -426,7 +426,7 @@ static const char *kallsyms_lookup_buildid(unsigned long 
addr,
offset, modname, namebuf);
 
 found:
-   cleanup_symbol_name(namebuf);
+   kallsyms_cleanup_symbol_name(namebuf);
return ret;
 }
 
@@ -446,7 +446,7 @@ const char *kallsyms_lookup(unsigned long addr,
   NULL, namebuf);
 }
 
-int lookup_symbol_name(unsigned long addr, char *symname)
+static int __lookup_symbol_name(unsigned long addr, char *symname, bool 
cleanup)
 {
int res;
 
@@ -468,10 +468,21 @@ int lookup_symbol_name(unsigned long addr, char *symname)
return res;
 
 found:
-   cleanup_symbol_name(symname);
+   if (cleanup)
+   kallsyms_cleanup_symbol_name(symname);
return 0;
 }
 
+int lookup_symbol_name(unsigned long addr, char *symname)
+{
+   return __lookup_symbol_name(addr, symname, true);
+}
+
+int kallsyms_lookup_symbol_full_name(unsigned long addr, char *symname)
+{
+   return __lookup_symbol_name(addr, symname, false);
+}
+
 /* Look up a kernel symbol and return it in a text buffer. */
 static int __sprint_symbol(char *buffer, unsigned long address,
   int symbol_offset, int add_offset, int add_buildid)
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 52426665eecc..284220e04801 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -128,6 +128,19 @@ struct klp_find_arg {
 static int klp_match_callback(void *data, unsigned long addr)
 {
struct klp_find_arg *args = data;
+#ifdef CONFIG_LTO_CLANG
+   char full_name[KSYM_NAME_LEN];
+
+   /*
+* With CONFIG_LTO_CLANG, we need to compare the full name of the
+* symbol (with .llvm. postfix).
+*/
+   if (kallsyms_lookup_symbol_full_name(addr, full_name))
+   return 0;
+
+   if (strcmp(args->name, full_name))
+   return 0;
+#endif
 
args->addr = addr;
args->count++;
@@ -145,10 +158,12 @@ static int klp_match_callback(void *data, unsigned long 
addr)
 
 static int klp_find_callback(void *data, const char *name, unsigned long addr)
 {
+#ifndef CONFIG_LTO_CLANG
struct klp_find_arg *args = data;
 
if (strcmp(args->name, 

Re: [PATCH] livepatch: introduce klp_func called interface

2024-06-04 Thread zhang warden
Hi Joe,

> 
> Perhaps "responsibility" is a better description.  This would introduce
> an attribute that someone's userspace utility is relying on.  It should
> at least have a kselftest to ensure a random patch in 2027 doesn't break
> it.
I sent this patch to see the what the community thinks about this attribute 
(although it think it is necessary and this will be more convenient for users).

If this patch is seems to be good, I will add a kselftest to this attribute.

As Miroslav and Petr said, keeping klp_ftrace_handler() as fast as possible is 
also important, which I need to find a way to keep it fast (or just setting the 
state to be true instead of a judgement?).

> The kernel docs provide a lot of explanation of the complete ftracing
> interface.  It's pretty power stuff, though you may also go the other
> direction and look into using the trace-cmd front end to simplify all of
> the sysfs manipulation.  Julia Evans wrote a blog [1] a while back that
> provides a some more examples.
> 
> [1] https://jvns.ca/blog/2017/03/19/getting-started-with-ftrace/
> 
> --
> Joe

Nice of you! Thanks! I will learn it!

Regards,
Wardenjohn




Re: [PATCH 4/5] ftrace: Convert "filter_hash" and "inc" to bool in ftrace_hash_rec_update_modify()

2024-06-04 Thread Steven Rostedt
On Wed, 5 Jun 2024 09:12:57 +0800
kernel test robot  wrote:

> >> kernel/trace/ftrace.c:1961:13: error: conflicting types for 
> >> 'ftrace_hash_rec_disable_modify'  
> 1961 | static void ftrace_hash_rec_disable_modify(struct ftrace_ops *ops,
>  | ^
>kernel/trace/ftrace.c:1384:1: note: previous declaration is here
> 1384 | ftrace_hash_rec_disable_modify(struct ftrace_ops *ops, int 
> filter_hash);
>  | ^
> >> kernel/trace/ftrace.c:1967:13: error: conflicting types for 
> >> 'ftrace_hash_rec_enable_modify'  
> 1967 | static void ftrace_hash_rec_enable_modify(struct ftrace_ops *ops,

Of course these static functions have prototypes to be used earlier. Bah!

-- Steve



[PATCH] rpmsg: char: add missing MODULE_DESCRIPTION() macro

2024-06-04 Thread Jeff Johnson
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/rpmsg/rpmsg_char.o

Add the missing invocation of the MODULE_DESCRIPTION() macro.

Signed-off-by: Jeff Johnson 
---
 drivers/rpmsg/rpmsg_char.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
index d7a342510902..73b9fa113b34 100644
--- a/drivers/rpmsg/rpmsg_char.c
+++ b/drivers/rpmsg/rpmsg_char.c
@@ -566,4 +566,5 @@ static void rpmsg_chrdev_exit(void)
 module_exit(rpmsg_chrdev_exit);
 
 MODULE_ALIAS("rpmsg:rpmsg_chrdev");
+MODULE_DESCRIPTION("RPMSG device interface");
 MODULE_LICENSE("GPL v2");

---
base-commit: a693b9c95abd4947c2d06e05733de5d470ab6586
change-id: 20240604-md-drivers-rpmsg_char-02914d244ec9




Re: [PATCH 4/5] ftrace: Convert "filter_hash" and "inc" to bool in ftrace_hash_rec_update_modify()

2024-06-04 Thread kernel test robot
Hi Steven,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.10-rc2 next-20240604]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Steven-Rostedt/ftrace-Rename-dup_hash-and-comment-it/20240605-053138
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git 
mm-everything
patch link:https://lore.kernel.org/r/20240604212855.046127611%40goodmis.org
patch subject: [PATCH 4/5] ftrace: Convert "filter_hash" and "inc" to bool in 
ftrace_hash_rec_update_modify()
config: s390-defconfig 
(https://download.01.org/0day-ci/archive/20240605/202406050838.7r32jzdi-...@intel.com/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 
d7d2d4f53fc79b4b58e8d8d08151b577c3699d4a)
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240605/202406050838.7r32jzdi-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202406050838.7r32jzdi-...@intel.com/

All errors (new ones prefixed by >>):

   In file included from kernel/trace/ftrace.c:17:
   In file included from include/linux/stop_machine.h:5:
   In file included from include/linux/cpu.h:17:
   In file included from include/linux/node.h:18:
   In file included from include/linux/device.h:32:
   In file included from include/linux/device/driver.h:21:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/s390/include/asm/elf.h:173:
   In file included from arch/s390/include/asm/mmu_context.h:11:
   In file included from arch/s390/include/asm/pgalloc.h:18:
   In file included from include/linux/mm.h:2245:
   include/linux/vmstat.h:484:43: warning: arithmetic between different 
enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') 
[-Wenum-enum-conversion]
 484 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
 |~ ^
 485 |item];
 |
   include/linux/vmstat.h:491:43: warning: arithmetic between different 
enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') 
[-Wenum-enum-conversion]
 491 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
 |~ ^
 492 |NR_VM_NUMA_EVENT_ITEMS +
 |~~
   include/linux/vmstat.h:498:36: warning: arithmetic between different 
enumeration types ('enum node_stat_item' and 'enum lru_list') 
[-Wenum-enum-conversion]
 498 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
 |   ~~~ ^ ~~~
   include/linux/vmstat.h:503:43: warning: arithmetic between different 
enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') 
[-Wenum-enum-conversion]
 503 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
 |~ ^
 504 |NR_VM_NUMA_EVENT_ITEMS +
 |~~
   include/linux/vmstat.h:512:43: warning: arithmetic between different 
enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') 
[-Wenum-enum-conversion]
 512 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
 |~ ^
 513 |NR_VM_NUMA_EVENT_ITEMS +
 |~~
   In file included from kernel/trace/ftrace.c:18:
   In file included from include/linux/clocksource.h:22:
   In file included from arch/s390/include/asm/io.h:93:
   include/asm-generic/io.h:548:31: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
 548 | val = __raw_readb(PCI_IOBASE + addr);
 |   ~~ ^
   include/asm-generic/io.h:561:61: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
 561 | val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + 
addr));
 | ~~ ^
   include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro 
'__le16_to_cpu'
  37 | #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
 |   

Re: [syzbot] [usb?] [input?] INFO: task hung in __input_unregister_device (5)

2024-06-04 Thread syzbot
syzbot has bisected this issue to:

commit 6b0b708f12d18f9cccfb1c418bea59fcbff8798c
Author: Takashi Sakamoto 
Date:   Wed May 1 07:32:38 2024 +

firewire: core: add tracepoint event for handling bus reset

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=14969a1698
start commit:   e0cce98fe279 Merge tag 'tpmdd-next-6.10-rc2' of git://git...
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12969a1698
kernel config:  https://syzkaller.appspot.com/x/.config?x=238430243a58f702
dashboard link: https://syzkaller.appspot.com/bug?extid=78e2288f58b881ed3c45
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1318e16298
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=122e8eaa98

Reported-by: syzbot+78e2288f58b881ed3...@syzkaller.appspotmail.com
Fixes: 6b0b708f12d1 ("firewire: core: add tracepoint event for handling bus 
reset")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection



Re: [PATCH] ftrace: adding the missing parameter descriptions of unregister_ftrace_direct

2024-06-04 Thread Marilene Andrade Garcia

On 27/05/2024 21:50, MarileneGarcia wrote:

Adding the description of the parameters addr and free_filters
of the function unregister_ftrace_direct.

Signed-off-by: MarileneGarcia 
---
Hello,
These changes fix the following compiler warnings of the function
unregister_ftrace_direct.

The warnings happen using GCC compiler, enabling the ftrace related
configs and using the command 'make W=1'.

kernel/trace/ftrace.c:5489: warning: Function parameter or struct member
'addr' not described in 'unregister_ftrace_direct'

kernel/trace/ftrace.c:5489: warning: Function parameter or struct member
'free_filters' not described in 'unregister_ftrace_direct'

  kernel/trace/ftrace.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 65208d3b5ed9..6062e4ce1957 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -5475,6 +5475,8 @@ EXPORT_SYMBOL_GPL(register_ftrace_direct);
   * unregister_ftrace_direct - Remove calls to custom trampoline
   * previously registered by register_ftrace_direct for @ops object.
   * @ops: The address of the struct ftrace_ops object
+ * @addr: The address of the trampoline to call at @ops functions
+ * @free_filters: non zero to remove all filters for the ftrace_ops
   *
   * This is used to remove a direct calls to @addr from the nop locations
   * of the functions registered in @ops (with by ftrace_set_filter_ip


Hello everyone,
Do you have any feedback to me about the patch?

Thank you,
Marilene



Re: [PATCH V1] rpmsg: glink: Make glink smem interrupt wakeup capable

2024-06-04 Thread Chris Lew




On 6/3/2024 2:37 AM, Caleb Connolly wrote:

Hi Deepak,

On 03/06/2024 09:36, Deepak Kumar Singh wrote:

There are certain usecases which require glink interrupt to be
wakeup capable. For example if handset is in sleep state and
usb charger is plugged in, dsp wakes up and sends glink interrupt
to host for glink pmic channel communication. Glink is suppose to
wakeup host processor completely for further glink data handling.
IRQF_NO_SUSPEND does not gurantee complete wakeup, system may again
enter sleep after interrupt handling and glink data may not be
handled by pmic client driver.

To ensure data handling by client configure glink smem device as
wakeup source and attach glink interrupt as wakeup irq. Remove
IRQF_NO_SUSPEND flag as it is no longer required.


I'm not sure I agree with this approach, glink is used for lots of 
things -- like QRTR, where the sensor DSP and modem may also need to 
wake the system up (e.g. for "wake on pickup" on mobile, or for incoming 
calls/sms).


Configuring this to always wake up the system fully will result in a lot 
of spurious wakeups for arbitrary modem notifications (e.g. signal 
strength changes) if userspace hasn't properly configured these 
(something ModemManager currently lacks support for).


IRQF_NO_SUSPEND is presumably necessary to keep the DSPs happy? iirc 
downstream Qualcomm kernels have historically taken this approach to 
avoid spurious wakeups.




To give some more context, until recently the GLINK interrupt was 
managed and requested in the GLINK native layer. Any type of interrupt 
configuration would affect all of the links. The interrupt is now being 
requested at the transport layer (smem/rpm), so it has a little more 
fine grain control.


In downstream, we had switched to IRQF_NO_SUSPEND because there were a 
couple of cases where glink communication with rpm was needed during the 
suspend path. Having the interrupt configured as wake capable conflicted 
with the use case.


The general expectation from the DSPs is that if it is important enough 
to send, then it should be important enough to wake the APPS subsystem. 
We've always had to work around the fact we were using IRQF_NO_SUSPEND 
in downstream.


I proposed an alternative approach some time back that would allow the 
wakeup to be configured on a per-channel basis.


https://lore.kernel.org/linux-arm-msm/20230117142414.983946-1-caleb.conno...@linaro.org/
> Back then Bjorn proposed using some socket specific mechanism to handle
this for QRTR, but given this is now a common issue for multiple glink 
channels, maybe it's something we could revisit.


Requiring the wakeup be enabled by userspace clearly doesn't make sense 
for your proposed usecase, perhaps there's a way to configure this on a 
per-channel basis in-kernel (maybe as the rpmsg API?).




This alternative approach seems reasonable to me as well. I think the 
only drawback I see with this approach is non-data traffic may stall. 
The glink protocol traffic not tied to a TX_DATA command, such as intent 
requests, wouldn't wake the system even if the channel is configured to 
be wake capable.


Thanks,
Chris


Thanks and regards,


Signed-off-by: Deepak Kumar Singh 
---
  drivers/rpmsg/qcom_glink_smem.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/rpmsg/qcom_glink_smem.c 
b/drivers/rpmsg/qcom_glink_smem.c

index 7a982c60a8dd..f1b553efab13 100644
--- a/drivers/rpmsg/qcom_glink_smem.c
+++ b/drivers/rpmsg/qcom_glink_smem.c
@@ -22,6 +22,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
@@ -306,8 +307,7 @@ struct qcom_glink_smem 
*qcom_glink_smem_register(struct device *parent,

  smem->irq = of_irq_get(smem->dev.of_node, 0);
  ret = devm_request_irq(>dev, smem->irq, qcom_glink_smem_intr,
-   IRQF_NO_SUSPEND | IRQF_NO_AUTOEN,
-   "glink-smem", smem);
+   IRQF_NO_AUTOEN, "glink-smem", smem);
  if (ret) {
  dev_err(>dev, "failed to request IRQ\n");
  goto err_put_dev;
@@ -346,6 +346,8 @@ struct qcom_glink_smem 
*qcom_glink_smem_register(struct device *parent,

  smem->glink = glink;
+    device_init_wakeup(dev, true);
+    dev_pm_set_wake_irq(dev, smem->irq);
  enable_irq(smem->irq);
  return smem;
@@ -365,6 +367,8 @@ void qcom_glink_smem_unregister(struct 
qcom_glink_smem *smem)

  struct qcom_glink *glink = smem->glink;
  disable_irq(smem->irq);
+    dev_pm_clear_wake_irq(>dev);
+    device_init_wakeup(>dev, false);
  qcom_glink_native_remove(glink);






Re: [PATCH v14 14/14] selftests/sgx: Add scripts for EPC cgroup testing

2024-06-04 Thread Jarkko Sakkinen
On Sat Jun 1, 2024 at 1:26 AM EEST, Haitao Huang wrote:
> With different cgroups, the script starts one or multiple concurrent SGX
> selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed
> test case, which loads an enclave of EPC size equal to the EPC capacity
> available on the platform. The script checks results against the
> expectation set for each cgroup and reports success or failure.
>
> The script creates 3 different cgroups at the beginning with following
> expectations:
>
> 1) small - intentionally small enough to fail the test loading an
> enclave of size equal to the capacity.
> 2) large - large enough to run up to 4 concurrent tests but fail some if
> more than 4 concurrent tests are run. The script starts 4 expecting at
> least one test to pass, and then starts 5 expecting at least one test
> to fail.
> 3) larger - limit is the same as the capacity, large enough to run lots of
> concurrent tests. The script starts 8 of them and expects all pass.
> Then it reruns the same test with one process randomly killed and
> usage checked to be zero after all processes exit.
>
> The script also includes a test with low mem_cg limit and large sgx_epc
> limit to verify that the RAM used for per-cgroup reclamation is charged
> to a proper mem_cg. For this test, it turns off swapping before start,
> and turns swapping back on afterwards.
>
> Add README to document how to run the tests.
>
> Signed-off-by: Haitao Huang 
> Reviewed-by: Jarkko Sakkinen 
> Tested-by: Jarkko Sakkinen 

Reorg:

void sgx_cgroup_init(void)
{
struct workqueue_struct *wq;

/* eagerly allocate the workqueue: */
wq = alloc_workqueue("sgx_cg_wq", wq_unbound | wq_freezable, 
wq_unbound_max_active);
if (!wq) {
pr_warn("sgx_cg_wq creation failed\n");
return;
}

misc_cg_set_ops(MISC_CG_RES_SGX_EPC, _cgroup_ops);
sgx_cgroup_misc_init(misc_cg_root(), _cg_root);

/* Depending on misc state, keep or destory the workqueue: */
if (cgroup_subsys_enabled(misc_cgrp_subsys))
sgx_cg_wq = wq;
else
destroy_workqueue(wq);
}

BTW, why two previous operations are performed if subsystem is not
enabled?

I.e. why not instead:

void sgx_cgroup_init(void)
{
struct workqueue_struct *wq;

/* Eagerly allocate the workqueue: */
wq = alloc_workqueue("sgx_cg_wq", wq_unbound | wq_freezable, 
wq_unbound_max_active);
if (!wq) {
pr_warn("sgx_cg_wq creation failed\n");
return;
}

if (!cgroup_subsys_enabled(misc_cgrp_subsys)) {
destroy_workqueue(wq);
return;
}

misc_cg_set_ops(MISC_CG_RES_SGX_EPC, _cgroup_ops);
sgx_cgroup_misc_init(misc_cg_root(), _cg_root);
sgx_cg_wq = wq;
}

Finally, why this does not have __init? And neither sgx_cgroup_misc_init().

The names for these are also somewhat confusing, maybe something like:

* __sgx_cgroups_misc_init()
* sgx_cgroups_misc_init()

And both with __init.

I just made a trivial checkpatch run as a final check, and spotted the
warning on BUG_ON(), and noticed that this can't be right as it is but
please comment and correct where I might have gotten something wrong.

With "--strict" flag I also catched these:

CHECK: spinlock_t definition without comment
#1308: FILE: arch/x86/kernel/cpu/sgx/sgx.h:122:
+   spinlock_t lock;

CHECK: multiple assignments should be avoided
#444: FILE: kernel/cgroup/misc.c:450:
+   parent_cg = cg = _cg;

BR, Jarkko



[RFC v3 net-next 7/7] af_packet: use sk_skb_reason_drop to free rx packets

2024-06-04 Thread Yan Zhai
Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Reported-by: kernel test robot 
Closes: https://lore.kernel.org/r/202406011859.aacus8gv-...@intel.com/
Signed-off-by: Yan Zhai 
---
v2->v3: fixed uninitialized sk, added missing report tags.
---
 net/packet/af_packet.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index fce390887591..42d29b8a84fc 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2121,7 +2121,7 @@ static int packet_rcv(struct sk_buff *skb, struct 
net_device *dev,
  struct packet_type *pt, struct net_device *orig_dev)
 {
enum skb_drop_reason drop_reason = SKB_CONSUMED;
-   struct sock *sk;
+   struct sock *sk = NULL;
struct sockaddr_ll *sll;
struct packet_sock *po;
u8 *skb_head = skb->data;
@@ -2226,7 +2226,7 @@ static int packet_rcv(struct sk_buff *skb, struct 
net_device *dev,
skb->len = skb_len;
}
 drop:
-   kfree_skb_reason(skb, drop_reason);
+   sk_skb_reason_drop(sk, skb, drop_reason);
return 0;
 }
 
@@ -2234,7 +2234,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct 
net_device *dev,
   struct packet_type *pt, struct net_device *orig_dev)
 {
enum skb_drop_reason drop_reason = SKB_CONSUMED;
-   struct sock *sk;
+   struct sock *sk = NULL;
struct packet_sock *po;
struct sockaddr_ll *sll;
union tpacket_uhdr h;
@@ -2494,7 +2494,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct 
net_device *dev,
skb->len = skb_len;
}
 drop:
-   kfree_skb_reason(skb, drop_reason);
+   sk_skb_reason_drop(sk, skb, drop_reason);
return 0;
 
 drop_n_account:
@@ -2503,7 +2503,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct 
net_device *dev,
drop_reason = SKB_DROP_REASON_PACKET_SOCK_ERROR;
 
sk->sk_data_ready(sk);
-   kfree_skb_reason(copy_skb, drop_reason);
+   sk_skb_reason_drop(sk, copy_skb, drop_reason);
goto drop_n_restore;
 }
 
-- 
2.30.2





[RFC v3 net-next 6/7] udp: use sk_skb_reason_drop to free rx packets

2024-06-04 Thread Yan Zhai
Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Reported-by: kernel test robot 
Closes: https://lore.kernel.org/r/202406011751.npvn0ssk-...@intel.com/
Signed-off-by: Yan Zhai 
---
v2->v3: added missing report tags
---
 net/ipv4/udp.c | 10 +-
 net/ipv6/udp.c | 10 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 189c9113fe9a..ecafb1695999 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2074,7 +2074,7 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct 
sk_buff *skb)
}
UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
trace_udp_fail_queue_rcv_skb(rc, sk, skb);
-   kfree_skb_reason(skb, drop_reason);
+   sk_skb_reason_drop(sk, skb, drop_reason);
return -1;
}
 
@@ -2196,7 +2196,7 @@ static int udp_queue_rcv_one_skb(struct sock *sk, struct 
sk_buff *skb)
 drop:
__UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
atomic_inc(>sk_drops);
-   kfree_skb_reason(skb, drop_reason);
+   sk_skb_reason_drop(sk, skb, drop_reason);
return -1;
 }
 
@@ -2383,7 +2383,7 @@ static int udp_unicast_rcv_skb(struct sock *sk, struct 
sk_buff *skb,
 int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
   int proto)
 {
-   struct sock *sk;
+   struct sock *sk = NULL;
struct udphdr *uh;
unsigned short ulen;
struct rtable *rt = skb_rtable(skb);
@@ -2460,7 +2460,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table 
*udptable,
 * Hmm.  We got an UDP packet to a port to which we
 * don't wanna listen.  Ignore it.
 */
-   kfree_skb_reason(skb, drop_reason);
+   sk_skb_reason_drop(sk, skb, drop_reason);
return 0;
 
 short_packet:
@@ -2485,7 +2485,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table 
*udptable,
__UDP_INC_STATS(net, UDP_MIB_CSUMERRORS, proto == IPPROTO_UDPLITE);
 drop:
__UDP_INC_STATS(net, UDP_MIB_INERRORS, proto == IPPROTO_UDPLITE);
-   kfree_skb_reason(skb, drop_reason);
+   sk_skb_reason_drop(sk, skb, drop_reason);
return 0;
 }
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index c81a07ac0463..b56f0b9f4307 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -673,7 +673,7 @@ static int __udpv6_queue_rcv_skb(struct sock *sk, struct 
sk_buff *skb)
}
UDP6_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
trace_udp_fail_queue_rcv_skb(rc, sk, skb);
-   kfree_skb_reason(skb, drop_reason);
+   sk_skb_reason_drop(sk, skb, drop_reason);
return -1;
}
 
@@ -776,7 +776,7 @@ static int udpv6_queue_rcv_one_skb(struct sock *sk, struct 
sk_buff *skb)
 drop:
__UDP6_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
atomic_inc(>sk_drops);
-   kfree_skb_reason(skb, drop_reason);
+   sk_skb_reason_drop(sk, skb, drop_reason);
return -1;
 }
 
@@ -940,8 +940,8 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table 
*udptable,
enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED;
const struct in6_addr *saddr, *daddr;
struct net *net = dev_net(skb->dev);
+   struct sock *sk = NULL;
struct udphdr *uh;
-   struct sock *sk;
bool refcounted;
u32 ulen = 0;
 
@@ -1033,7 +1033,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table 
*udptable,
__UDP6_INC_STATS(net, UDP_MIB_NOPORTS, proto == IPPROTO_UDPLITE);
icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_PORT_UNREACH, 0);
 
-   kfree_skb_reason(skb, reason);
+   sk_skb_reason_drop(sk, skb, reason);
return 0;
 
 short_packet:
@@ -1054,7 +1054,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table 
*udptable,
__UDP6_INC_STATS(net, UDP_MIB_CSUMERRORS, proto == IPPROTO_UDPLITE);
 discard:
__UDP6_INC_STATS(net, UDP_MIB_INERRORS, proto == IPPROTO_UDPLITE);
-   kfree_skb_reason(skb, reason);
+   sk_skb_reason_drop(sk, skb, reason);
return 0;
 }
 
-- 
2.30.2





[RFC v3 net-next 5/7] tcp: use sk_skb_reason_drop to free rx packets

2024-06-04 Thread Yan Zhai
Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Reported-by: kernel test robot 
Closes: https://lore.kernel.org/r/202406011539.jhwbd7dx-...@intel.com/
Signed-off-by: Yan Zhai 
---
v2->v3: added missing report tags
---
 net/ipv4/syncookies.c | 2 +-
 net/ipv4/tcp_input.c  | 2 +-
 net/ipv4/tcp_ipv4.c   | 6 +++---
 net/ipv6/syncookies.c | 2 +-
 net/ipv6/tcp_ipv6.c   | 6 +++---
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index b61d36810fe3..1948d15f1f28 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -496,6 +496,6 @@ struct sock *cookie_v4_check(struct sock *sk, struct 
sk_buff *skb)
 out_free:
reqsk_free(req);
 out_drop:
-   kfree_skb_reason(skb, reason);
+   sk_skb_reason_drop(sk, skb, reason);
return NULL;
 }
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 5aadf64e554d..bedb079de1f0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4859,7 +4859,7 @@ static void tcp_drop_reason(struct sock *sk, struct 
sk_buff *skb,
enum skb_drop_reason reason)
 {
sk_drops_add(sk, skb);
-   kfree_skb_reason(skb, reason);
+   sk_skb_reason_drop(sk, skb, reason);
 }
 
 /* This one checks to see if we can put data from the
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 041c7eda9abe..f7a046bc4b27 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1939,7 +1939,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
 reset:
tcp_v4_send_reset(rsk, skb, sk_rst_convert_drop_reason(reason));
 discard:
-   kfree_skb_reason(skb, reason);
+   sk_skb_reason_drop(sk, skb, reason);
/* Be careful here. If this function gets more complicated and
 * gcc suffers from register pressure on the x86, sk (in %ebx)
 * might be destroyed here. This current version compiles correctly,
@@ -2176,8 +2176,8 @@ int tcp_v4_rcv(struct sk_buff *skb)
int dif = inet_iif(skb);
const struct iphdr *iph;
const struct tcphdr *th;
+   struct sock *sk = NULL;
bool refcounted;
-   struct sock *sk;
int ret;
u32 isn;
 
@@ -2376,7 +2376,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
 discard_it:
SKB_DR_OR(drop_reason, NOT_SPECIFIED);
/* Discard frame. */
-   kfree_skb_reason(skb, drop_reason);
+   sk_skb_reason_drop(sk, skb, drop_reason);
return 0;
 
 discard_and_relse:
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index bfad1e89b6a6..9d83eadd308b 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -275,6 +275,6 @@ struct sock *cookie_v6_check(struct sock *sk, struct 
sk_buff *skb)
 out_free:
reqsk_free(req);
 out_drop:
-   kfree_skb_reason(skb, reason);
+   sk_skb_reason_drop(sk, skb, reason);
return NULL;
 }
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 1ac7502e1bf5..93967accc35d 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1678,7 +1678,7 @@ int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
 discard:
if (opt_skb)
__kfree_skb(opt_skb);
-   kfree_skb_reason(skb, reason);
+   sk_skb_reason_drop(sk, skb, reason);
return 0;
 csum_err:
reason = SKB_DROP_REASON_TCP_CSUM;
@@ -1751,8 +1751,8 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff 
*skb)
int dif = inet6_iif(skb);
const struct tcphdr *th;
const struct ipv6hdr *hdr;
+   struct sock *sk = NULL;
bool refcounted;
-   struct sock *sk;
int ret;
u32 isn;
struct net *net = dev_net(skb->dev);
@@ -1944,7 +1944,7 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff 
*skb)
 
 discard_it:
SKB_DR_OR(drop_reason, NOT_SPECIFIED);
-   kfree_skb_reason(skb, drop_reason);
+   sk_skb_reason_drop(sk, skb, drop_reason);
return 0;
 
 discard_and_relse:
-- 
2.30.2





[RFC v3 net-next 4/7] net: raw: use sk_skb_reason_drop to free rx packets

2024-06-04 Thread Yan Zhai
Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Signed-off-by: Yan Zhai 
---
 net/ipv4/raw.c | 4 ++--
 net/ipv6/raw.c | 8 
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 1a0953650356..474dfd263c8b 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -301,7 +301,7 @@ static int raw_rcv_skb(struct sock *sk, struct sk_buff *skb)
 
ipv4_pktinfo_prepare(sk, skb, true);
if (sock_queue_rcv_skb_reason(sk, skb, ) < 0) {
-   kfree_skb_reason(skb, reason);
+   sk_skb_reason_drop(sk, skb, reason);
return NET_RX_DROP;
}
 
@@ -312,7 +312,7 @@ int raw_rcv(struct sock *sk, struct sk_buff *skb)
 {
if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) {
atomic_inc(>sk_drops);
-   kfree_skb_reason(skb, SKB_DROP_REASON_XFRM_POLICY);
+   sk_skb_reason_drop(sk, skb, SKB_DROP_REASON_XFRM_POLICY);
return NET_RX_DROP;
}
nf_reset_ct(skb);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index f838366e8256..608fa9d05b55 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -362,14 +362,14 @@ static inline int rawv6_rcv_skb(struct sock *sk, struct 
sk_buff *skb)
if ((raw6_sk(sk)->checksum || rcu_access_pointer(sk->sk_filter)) &&
skb_checksum_complete(skb)) {
atomic_inc(>sk_drops);
-   kfree_skb_reason(skb, SKB_DROP_REASON_SKB_CSUM);
+   sk_skb_reason_drop(sk, skb, SKB_DROP_REASON_SKB_CSUM);
return NET_RX_DROP;
}
 
/* Charge it to the socket. */
skb_dst_drop(skb);
if (sock_queue_rcv_skb_reason(sk, skb, ) < 0) {
-   kfree_skb_reason(skb, reason);
+   sk_skb_reason_drop(sk, skb, reason);
return NET_RX_DROP;
}
 
@@ -390,7 +390,7 @@ int rawv6_rcv(struct sock *sk, struct sk_buff *skb)
 
if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) {
atomic_inc(>sk_drops);
-   kfree_skb_reason(skb, SKB_DROP_REASON_XFRM_POLICY);
+   sk_skb_reason_drop(sk, skb, SKB_DROP_REASON_XFRM_POLICY);
return NET_RX_DROP;
}
nf_reset_ct(skb);
@@ -415,7 +415,7 @@ int rawv6_rcv(struct sock *sk, struct sk_buff *skb)
if (inet_test_bit(HDRINCL, sk)) {
if (skb_checksum_complete(skb)) {
atomic_inc(>sk_drops);
-   kfree_skb_reason(skb, SKB_DROP_REASON_SKB_CSUM);
+   sk_skb_reason_drop(sk, skb, SKB_DROP_REASON_SKB_CSUM);
return NET_RX_DROP;
}
}
-- 
2.30.2





[RFC v3 net-next 3/7] ping: use sk_skb_reason_drop to free rx packets

2024-06-04 Thread Yan Zhai
Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Signed-off-by: Yan Zhai 
---
 net/ipv4/ping.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 823306487a82..619ddc087957 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -946,7 +946,7 @@ static enum skb_drop_reason __ping_queue_rcv_skb(struct 
sock *sk,
pr_debug("ping_queue_rcv_skb(sk=%p,sk->num=%d,skb=%p)\n",
 inet_sk(sk), inet_sk(sk)->inet_num, skb);
if (sock_queue_rcv_skb_reason(sk, skb, ) < 0) {
-   kfree_skb_reason(skb, reason);
+   sk_skb_reason_drop(sk, skb, reason);
pr_debug("ping_queue_rcv_skb -> failed\n");
return reason;
}
-- 
2.30.2





[RFC v3 net-next 2/7] net: introduce sk_skb_reason_drop function

2024-06-04 Thread Yan Zhai
Long used destructors kfree_skb and kfree_skb_reason do not pass
receiving socket to packet drop tracepoints trace_kfree_skb.
This makes it hard to track packet drops of a certain netns (container)
or a socket (user application).

The naming of these destructors are also not consistent with most sk/skb
operating functions, i.e. functions named "sk_xxx" or "skb_xxx".
Introduce a new functions sk_skb_reason_drop as drop-in replacement for
kfree_skb_reason on local receiving path. Callers can now pass receiving
sockets to the tracepoints.

kfree_skb and kfree_skb_reason are still usable but they are now just
inline helpers that call sk_skb_reason_drop.

Note it is not feasible to do the same to consume_skb. Packets not
dropped can flow through multiple receive handlers, and have multiple
receiving sockets. Leave it untouched for now.

Suggested-by: Eric Dumazet 
Signed-off-by: Yan Zhai 
---
v1->v2: changes function names to be more consistent with common sk/skb
operations
---
 include/linux/skbuff.h | 10 --
 net/core/skbuff.c  | 22 --
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index fe7d8dbef77e..c479a2515a62 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1251,8 +1251,14 @@ static inline bool skb_data_unref(const struct sk_buff 
*skb,
return true;
 }
 
-void __fix_address
-kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason);
+void __fix_address sk_skb_reason_drop(struct sock *sk, struct sk_buff *skb,
+ enum skb_drop_reason reason);
+
+static inline void
+kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason)
+{
+   sk_skb_reason_drop(NULL, skb, reason);
+}
 
 /**
  * kfree_skb - free an sk_buff with 'NOT_SPECIFIED' reason
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2854afdd713f..9def11fe42c4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1190,7 +1190,8 @@ void __kfree_skb(struct sk_buff *skb)
 EXPORT_SYMBOL(__kfree_skb);
 
 static __always_inline
-bool __kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason)
+bool __sk_skb_reason_drop(struct sock *sk, struct sk_buff *skb,
+ enum skb_drop_reason reason)
 {
if (unlikely(!skb_unref(skb)))
return false;
@@ -1203,26 +1204,27 @@ bool __kfree_skb_reason(struct sk_buff *skb, enum 
skb_drop_reason reason)
if (reason == SKB_CONSUMED)
trace_consume_skb(skb, __builtin_return_address(0));
else
-   trace_kfree_skb(skb, __builtin_return_address(0), reason, NULL);
+   trace_kfree_skb(skb, __builtin_return_address(0), reason, sk);
return true;
 }
 
 /**
- * kfree_skb_reason - free an sk_buff with special reason
+ * sk_skb_reason_drop - free an sk_buff with special reason
+ * @sk: the socket to receive @skb, or NULL if not applicable
  * @skb: buffer to free
  * @reason: reason why this skb is dropped
  *
- * Drop a reference to the buffer and free it if the usage count has
- * hit zero. Meanwhile, pass the drop reason to 'kfree_skb'
- * tracepoint.
+ * Drop a reference to the buffer and free it if the usage count has hit
+ * zero. Meanwhile, pass the receiving socket and drop reason to
+ * 'kfree_skb' tracepoint.
  */
 void __fix_address
-kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason)
+sk_skb_reason_drop(struct sock *sk, struct sk_buff *skb, enum skb_drop_reason 
reason)
 {
-   if (__kfree_skb_reason(skb, reason))
+   if (__sk_skb_reason_drop(sk, skb, reason))
__kfree_skb(skb);
 }
-EXPORT_SYMBOL(kfree_skb_reason);
+EXPORT_SYMBOL(sk_skb_reason_drop);
 
 #define KFREE_SKB_BULK_SIZE16
 
@@ -1261,7 +1263,7 @@ kfree_skb_list_reason(struct sk_buff *segs, enum 
skb_drop_reason reason)
while (segs) {
struct sk_buff *next = segs->next;
 
-   if (__kfree_skb_reason(segs, reason)) {
+   if (__sk_skb_reason_drop(NULL, segs, reason)) {
skb_poison_list(segs);
kfree_skb_add_bulk(segs, , reason);
}
-- 
2.30.2





[RFC v3 net-next 1/7] net: add rx_sk to trace_kfree_skb

2024-06-04 Thread Yan Zhai
skb does not include enough information to find out receiving
sockets/services and netns/containers on packet drops. In theory
skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP
stack for OOO packet lookup. Similarly, skb->sk often identifies a local
sender, and tells nothing about a receiver.

Allow passing an extra receiving socket to the tracepoint to improve
the visibility on receiving drops.

Signed-off-by: Yan Zhai 
---
v2->v3: fixed drop_monitor function prototype
---
 include/trace/events/skb.h | 11 +++
 net/core/dev.c |  2 +-
 net/core/drop_monitor.c|  9 ++---
 net/core/skbuff.c  |  2 +-
 4 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 07e0715628ec..aa6b46b6172c 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -24,15 +24,16 @@ DEFINE_DROP_REASON(FN, FN)
 TRACE_EVENT(kfree_skb,
 
TP_PROTO(struct sk_buff *skb, void *location,
-enum skb_drop_reason reason),
+enum skb_drop_reason reason, struct sock *rx_sk),
 
-   TP_ARGS(skb, location, reason),
+   TP_ARGS(skb, location, reason, rx_sk),
 
TP_STRUCT__entry(
__field(void *, skbaddr)
__field(void *, location)
__field(unsigned short, protocol)
__field(enum skb_drop_reason,   reason)
+   __field(void *, rx_skaddr)
),
 
TP_fast_assign(
@@ -40,12 +41,14 @@ TRACE_EVENT(kfree_skb,
__entry->location = location;
__entry->protocol = ntohs(skb->protocol);
__entry->reason = reason;
+   __entry->rx_skaddr = rx_sk;
),
 
-   TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s",
+   TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s rx_skaddr=%p",
  __entry->skbaddr, __entry->protocol, __entry->location,
  __print_symbolic(__entry->reason,
-  DEFINE_DROP_REASON(FN, FNe)))
+  DEFINE_DROP_REASON(FN, FNe)),
+ __entry->rx_skaddr)
 );
 
 #undef FN
diff --git a/net/core/dev.c b/net/core/dev.c
index 85fe8138f3e4..7844227ecbfd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5233,7 +5233,7 @@ static __latent_entropy void net_tx_action(struct 
softirq_action *h)
trace_consume_skb(skb, net_tx_action);
else
trace_kfree_skb(skb, net_tx_action,
-   get_kfree_skb_cb(skb)->reason);
+   get_kfree_skb_cb(skb)->reason, 
NULL);
 
if (skb->fclone != SKB_FCLONE_UNAVAILABLE)
__kfree_skb(skb);
diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 430ed18f8584..2e0ae3328232 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -109,7 +109,8 @@ static u32 net_dm_queue_len = 1000;
 struct net_dm_alert_ops {
void (*kfree_skb_probe)(void *ignore, struct sk_buff *skb,
void *location,
-   enum skb_drop_reason reason);
+   enum skb_drop_reason reason,
+   struct sock *rx_sk);
void (*napi_poll_probe)(void *ignore, struct napi_struct *napi,
int work, int budget);
void (*work_item_func)(struct work_struct *work);
@@ -264,7 +265,8 @@ static void trace_drop_common(struct sk_buff *skb, void 
*location)
 
 static void trace_kfree_skb_hit(void *ignore, struct sk_buff *skb,
void *location,
-   enum skb_drop_reason reason)
+   enum skb_drop_reason reason,
+   struct sock *rx_sk)
 {
trace_drop_common(skb, location);
 }
@@ -491,7 +493,8 @@ static const struct net_dm_alert_ops 
net_dm_alert_summary_ops = {
 static void net_dm_packet_trace_kfree_skb_hit(void *ignore,
  struct sk_buff *skb,
  void *location,
- enum skb_drop_reason reason)
+ enum skb_drop_reason reason,
+ struct sock *rx_sk)
 {
ktime_t tstamp = ktime_get_real();
struct per_cpu_dm_data *data;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 466999a7515e..2854afdd713f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1203,7 +1203,7 @@ bool __kfree_skb_reason(struct sk_buff *skb, enum 
skb_drop_reason reason)
if (reason == SKB_CONSUMED)
trace_consume_skb(skb, __builtin_return_address(0));
else

[RFC v3 net-next 0/7] net: pass receive socket to drop tracepoint

2024-06-04 Thread Yan Zhai
We set up our production packet drop monitoring around the kfree_skb
tracepoint. While this tracepoint is extremely valuable for diagnosing
critical problems, it also has some limitation with drops on the local
receive path: this tracepoint can only inspect the dropped skb itself,
but such skb might not carry enough information to:

1. determine in which netns/container this skb gets dropped
2. determine by which socket/service this skb oughts to be received

The 1st issue is because skb->dev is the only member field with valid
netns reference. But skb->dev can get cleared or reused. For example,
tcp_v4_rcv will clear skb->dev and in later processing it might be reused
for OFO tree.

The 2nd issue is because there is no reference on an skb that reliably
points to a receiving socket. skb->sk usually points to the local
sending socket, and it only points to a receive socket briefly after
early demux stage, yet the socket can get stolen later. For certain drop
reason like TCP OFO_MERGE, Zerowindow, UDP at PROTO_MEM error, etc, it
is hard to infer which receiving socket is impacted. This cannot be
overcome by simply looking at the packet header, because of
complications like sk lookup programs. In the past, single purpose
tracepoints like trace_udp_fail_queue_rcv_skb, trace_sock_rcvqueue_full,
etc are added as needed to provide more visibility. This could be
handled in a more generic way.

In this change set we propose a new 'sk_skb_reason_drop' call as a drop-in
replacement for kfree_skb_reason at various local input path. It accepts
an extra receiving socket argument. Both issues above can be resolved
via this new argument.

V2->V3: fixed drop_monitor function signatures; fixed a few uninitialized sks;
Added a few missing report tags from test bots (also noticed by Dan
Carpenter and Simon Horman).

V1->V2: instead of using skb->cb, directly add the needed argument to
trace_kfree_skb tracepoint. Also renamed functions as Eric Dumazet
suggested.

V2: 
https://lore.kernel.org/linux-kernel/cover.1717206060.git@cloudflare.com/
V1: https://lore.kernel.org/netdev/cover.1717105215.git@cloudflare.com/

Yan Zhai (7):
  net: add rx_sk to trace_kfree_skb
  net: introduce sk_skb_reason_drop function
  ping: use sk_skb_reason_drop to free rx packets
  net: raw: use sk_skb_reason_drop to free rx packets
  tcp: use sk_skb_reason_drop to free rx packets
  udp: use sk_skb_reason_drop to free rx packets
  af_packet: use sk_skb_reason_drop to free rx packets

 include/linux/skbuff.h | 10 --
 include/trace/events/skb.h | 11 +++
 net/core/dev.c |  2 +-
 net/core/drop_monitor.c|  9 ++---
 net/core/skbuff.c  | 22 --
 net/ipv4/ping.c|  2 +-
 net/ipv4/raw.c |  4 ++--
 net/ipv4/syncookies.c  |  2 +-
 net/ipv4/tcp_input.c   |  2 +-
 net/ipv4/tcp_ipv4.c|  6 +++---
 net/ipv4/udp.c | 10 +-
 net/ipv6/raw.c |  8 
 net/ipv6/syncookies.c  |  2 +-
 net/ipv6/tcp_ipv6.c|  6 +++---
 net/ipv6/udp.c | 10 +-
 net/packet/af_packet.c | 10 +-
 16 files changed, 65 insertions(+), 51 deletions(-)

-- 
2.30.2





[PATCH 5/5] ftrace: Add comments to ftrace_hash_move() and friends

2024-06-04 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

Describe what ftrace_hash_move() does and add some more comments to some
other functions to make it easier to understand.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/ftrace.c | 24 +++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 021024164938..2d955f0c688f 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -169,6 +169,7 @@ static inline void ftrace_ops_init(struct ftrace_ops *ops)
 #endif
 }
 
+/* Call this function for when a callback filters on set_ftrace_pid */
 static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *op, struct ftrace_regs *fregs)
 {
@@ -1317,7 +1318,7 @@ static struct ftrace_hash *alloc_ftrace_hash(int 
size_bits)
return hash;
 }
 
-
+/* Used to save filters on functions for modules not loaded yet */
 static int ftrace_add_mod(struct trace_array *tr,
  const char *func, const char *module,
  int enable)
@@ -1431,6 +1432,7 @@ static struct ftrace_hash *__move_hash(struct ftrace_hash 
*src, int size)
return new_hash;
 }
 
+/* Move the @src entries to a newly allocated hash */
 static struct ftrace_hash *
 __ftrace_hash_move(struct ftrace_hash *src)
 {
@@ -1445,6 +1447,26 @@ __ftrace_hash_move(struct ftrace_hash *src)
return __move_hash(src, size);
 }
 
+/**
+ * ftrace_hash_move - move a new hash to a filter and do updates
+ * @ops: The ops with the hash that @dst points to
+ * @enable: True if for the filter hash, false for the notrace hash
+ * @dst: Points to the @ops hash that should be updated
+ * @src: The hash to update @dst with
+ *
+ * This is called when an ftrace_ops hash is being updated and the
+ * the kernel needs to reflect this. Note, this only updates the kernel
+ * function callbacks if the @ops is enabled (not to be confused with
+ * @enable above). If the @ops is enabled, its hash determines what
+ * callbacks get called. This function gets called when the @ops hash
+ * is updated and it requires new callbacks.
+ *
+ * On success the elements of @src is moved to @dst, and @dst is updated
+ * properly, as well as the functions determined by the @ops hashes
+ * are now calling the @ops callback function.
+ *
+ * Regardless of return type, @src should be freed with free_ftrace_hash().
+ */
 static int
 ftrace_hash_move(struct ftrace_ops *ops, int enable,
 struct ftrace_hash **dst, struct ftrace_hash *src)
-- 
2.43.0





[PATCH 4/5] ftrace: Convert "filter_hash" and "inc" to bool in ftrace_hash_rec_update_modify()

2024-06-04 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

The parameters "filter_hash" and "inc" in the function
ftrace_hash_rec_update_modify() are boolean. Change them to be such.

Also add documentation to what the function does.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/ftrace.c | 33 -
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index de652201c86c..021024164938 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1915,8 +1915,31 @@ static bool ftrace_hash_rec_enable(struct ftrace_ops 
*ops)
return __ftrace_hash_rec_update(ops, true, 1);
 }
 
+/*
+ * This function will update what functions @ops traces when its filter
+ * changes. @filter_hash is set to true when modifying the filter_hash
+ * and set to false when modifying the notrace_hash.
+ *
+ * For example, if the user does: echo schedule > set_ftrace_filter
+ * that would call: ftrace_hash_rec_update_modify(ops, true, true);
+ *
+ * For: echo schedule >> set_ftrace_notrace
+ * That would call: ftrace_hash_rec_enable(ops, false, true);
+ *
+ * The @inc states if the @ops callbacks are going to be added or removed.
+ * The dyn_ftrace records are update via:
+ *
+ * ftrace_hash_rec_disable_modify(ops, filter_hash);
+ * ops->hash = new_hash
+ * ftrace_hash_rec_enable_modify(ops, filter_hash);
+ *
+ * Where the @ops is removed from all the records it is tracing using
+ * its old hash. The @ops hash is updated to the new hash, and then
+ * the @ops is added back to the records so that it is tracing all
+ * the new functions.
+ */
 static void ftrace_hash_rec_update_modify(struct ftrace_ops *ops,
- int filter_hash, int inc)
+ bool filter_hash, bool inc)
 {
struct ftrace_ops *op;
 
@@ -1939,15 +1962,15 @@ static void ftrace_hash_rec_update_modify(struct 
ftrace_ops *ops,
 }
 
 static void ftrace_hash_rec_disable_modify(struct ftrace_ops *ops,
-  int filter_hash)
+  bool filter_hash)
 {
-   ftrace_hash_rec_update_modify(ops, filter_hash, 0);
+   ftrace_hash_rec_update_modify(ops, filter_hash, false);
 }
 
 static void ftrace_hash_rec_enable_modify(struct ftrace_ops *ops,
- int filter_hash)
+ bool filter_hash)
 {
-   ftrace_hash_rec_update_modify(ops, filter_hash, 1);
+   ftrace_hash_rec_update_modify(ops, filter_hash, true);
 }
 
 /*
-- 
2.43.0





[PATCH 3/5] ftrace: Remove "filter_hash" parameter from ftrace_hash_rec_disable/enable()

2024-06-04 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

The functions ftrace_hash_rec_disable() and ftrace_hash_rec_enable()
always has 1 passed to its "ftrace_hash" parameter. Remove the parameter
and pass in true to __ftrace_hash_rec_update().

Also add some comments to both those functions explaining what they do.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/ftrace.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 93c7c5fd4249..de652201c86c 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1895,16 +1895,24 @@ static bool __ftrace_hash_rec_update(struct ftrace_ops 
*ops,
return update;
 }
 
-static bool ftrace_hash_rec_disable(struct ftrace_ops *ops,
-   int filter_hash)
+/*
+ * This is called when an ops is removed from tracing. It will decrement
+ * the counters of the dyn_ftrace records for all the functions that
+ * the @ops attached to.
+ */
+static bool ftrace_hash_rec_disable(struct ftrace_ops *ops)
 {
-   return __ftrace_hash_rec_update(ops, filter_hash, 0);
+   return __ftrace_hash_rec_update(ops, true, 0);
 }
 
-static bool ftrace_hash_rec_enable(struct ftrace_ops *ops,
-  int filter_hash)
+/*
+ * This is called when an ops is added to tracing. It will increment
+ * the counters of the dyn_ftrace records for all the functions that
+ * the @ops attached to.
+ */
+static bool ftrace_hash_rec_enable(struct ftrace_ops *ops)
 {
-   return __ftrace_hash_rec_update(ops, filter_hash, 1);
+   return __ftrace_hash_rec_update(ops, true, 1);
 }
 
 static void ftrace_hash_rec_update_modify(struct ftrace_ops *ops,
@@ -3062,7 +3070,7 @@ int ftrace_startup(struct ftrace_ops *ops, int command)
return ret;
}
 
-   if (ftrace_hash_rec_enable(ops, 1))
+   if (ftrace_hash_rec_enable(ops))
command |= FTRACE_UPDATE_CALLS;
 
ftrace_startup_enable(command);
@@ -3104,7 +3112,7 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
/* Disabling ipmodify never fails */
ftrace_hash_ipmodify_disable(ops);
 
-   if (ftrace_hash_rec_disable(ops, 1))
+   if (ftrace_hash_rec_disable(ops))
command |= FTRACE_UPDATE_CALLS;
 
ops->flags &= ~FTRACE_OPS_FL_ENABLED;
-- 
2.43.0





[PATCH 1/5] ftrace: Rename dup_hash() and comment it

2024-06-04 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

The name "dup_hash()" is a misnomer as it does not duplicate the hash that
is passed in, but instead moves its entities from that hash to a newly
allocated one. Rename it to "__move_hash()" (using starting underscores as
it is an internal function), and add some comments about what it does.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/ftrace.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index da7e6abf48b4..9dcdefe9d1aa 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1391,7 +1391,11 @@ ftrace_hash_rec_enable_modify(struct ftrace_ops *ops, 
int filter_hash);
 static int ftrace_hash_ipmodify_update(struct ftrace_ops *ops,
   struct ftrace_hash *new_hash);
 
-static struct ftrace_hash *dup_hash(struct ftrace_hash *src, int size)
+/*
+ * Allocate a new hash and remove entries from @src and move them to the new 
hash.
+ * On success, the @src hash will be empty and should be freed.
+ */
+static struct ftrace_hash *__move_hash(struct ftrace_hash *src, int size)
 {
struct ftrace_func_entry *entry;
struct ftrace_hash *new_hash;
@@ -1438,7 +1442,7 @@ __ftrace_hash_move(struct ftrace_hash *src)
if (ftrace_hash_empty(src))
return EMPTY_HASH;
 
-   return dup_hash(src, size);
+   return __move_hash(src, size);
 }
 
 static int
-- 
2.43.0





[PATCH 0/5] ftrace: Clean up and comment code

2024-06-04 Thread Steven Rostedt
While working on the function_graph multiple users code, I realized
that I was struggling with how the ftrace code worked. Being the
author of such code meant that it wasn't very intuitive. Namely, the
function names were not descriptive enough, or at least, they needed
comments.

This series moves to solve some of that via changing a couple function
names and parameters and adding comments to many of them.

There's more to do, but this at least moves it in the right direction.


Steven Rostedt (Google) (5):
  ftrace: Rename dup_hash() and comment it
  ftrace: Comment __ftrace_hash_rec_update() and make filter_hash bool
  ftrace: Remove "filter_hash" parameter from 
ftrace_hash_rec_disable/enable()
  ftrace: Convert "filter_hash" and "inc" to bool in 
ftrace_hash_rec_update_modify()
  ftrace: Add comments to ftrace_hash_move() and friends


 kernel/trace/ftrace.c | 103 +-
 1 file changed, 86 insertions(+), 17 deletions(-)



[PATCH 2/5] ftrace: Comment __ftrace_hash_rec_update() and make filter_hash bool

2024-06-04 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

The function __ftrace_hash_rec_update() parameter "filter_hash" is only
used for true or false (boolean), but is of type int. It already has an
"inc" parameter that is boolean. This is confusing, make "filter_hash"
boolean as well.

While at it, add some documentation to that function especially since it
holds the guts of the filtering logic of ftrace.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/ftrace.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 9dcdefe9d1aa..93c7c5fd4249 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1701,8 +1701,20 @@ static bool skip_record(struct dyn_ftrace *rec)
!(rec->flags & FTRACE_FL_ENABLED);
 }
 
+/*
+ * This is the main engine to the ftrace updates.
+ *
+ * It will iterate through all the available ftrace functions
+ * (the ones that ftrace can have callbacks to) and set the flags
+ * to the associated dyn_ftrace records.
+ *
+ * @filter_hash: True if for the filter hash is udpated, false for the
+ *   notrace hash
+ * @inc: True to add this hash, false to remove it (increment the
+ *   recorder counters or decrement them).
+ */
 static bool __ftrace_hash_rec_update(struct ftrace_ops *ops,
-int filter_hash,
+bool filter_hash,
 bool inc)
 {
struct ftrace_hash *hash;
-- 
2.43.0





Re: (subset) [PATCH 0/7] Use mboxes instead of syscon for APCS (arm32 & arm64)

2024-06-04 Thread Bjorn Andersson


On Wed, 24 Apr 2024 18:23:53 +0200, Luca Weiss wrote:
> The first patch is for removing a bogus error warning I've noticed while
> developing this on msm8226 - there the patches are also coming later for
> this SoC since apcs is getting hooked up to cpufreq there also.
> 
> Apart from usages from the qcom,smsm driver (patches coming!) all other
> usages of the apcs mailbox now go via the mailbox driver - where one is
> used, so some arm32 boards will continue using "qcom,ipc*" properties in
> the short or long term.
> 
> [...]

Applied, thanks!

[3/7] arm64: dts: qcom: msm8916: Use mboxes properties for APCS
  commit: 3e971470619d80dd343e3abd80cb997bcb48f200
[4/7] arm64: dts: qcom: msm8939: Use mboxes properties for APCS
  commit: 22e4e43484c4dd1f29a72cc62411072758e0681a
[5/7] arm64: dts: qcom: msm8953: Use mboxes properties for APCS
  commit: 11dff973ebe21950c7c5221919141fb0cb16354e
[6/7] arm64: dts: qcom: msm8976: Use mboxes properties for APCS
  commit: a3d5570d8c8c6efc3d15d015b517f4e8bd11898f
[7/7] arm64: dts: qcom: msm8994: Use mboxes properties for APCS
  commit: ba5d9a91f8c3caf6867b3b87dce080d056222561

Best regards,
-- 
Bjorn Andersson 



Re: [PATCH v4] drivers: remoteproc: xlnx: add attach detach support

2024-06-04 Thread Bjorn Andersson
On Mon, Jun 03, 2024 at 01:34:38PM -0700, Tanmay Shah wrote:
> It is possible that remote processor is already running before
> linux boot or remoteproc platform driver probe. Implement required
> remoteproc framework ops to provide resource table address and
> connect or disconnect with remote processor in such case.
> 

I know if changes the current style for this driver, but could we drop
"drivers: " from the subject prefix, to align changes to this driver
with others?

Regards,
Bjorn



[RFC bpf-next 10/10] selftests/bpf: Add uprobe session recursive test

2024-06-04 Thread Jiri Olsa
Adding uprobe session test that verifies the cookie value is stored
properly when single uprobe-ed function is executed recursively.

Signed-off-by: Jiri Olsa 
---
 .../bpf/prog_tests/uprobe_multi_test.c| 57 +++
 .../progs/uprobe_multi_session_recursive.c| 44 ++
 2 files changed, 101 insertions(+)
 create mode 100644 
tools/testing/selftests/bpf/progs/uprobe_multi_session_recursive.c

diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c 
b/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
index 34671253e130..cdd7c327e16d 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
@@ -7,6 +7,7 @@
 #include "uprobe_multi_usdt.skel.h"
 #include "uprobe_multi_session.skel.h"
 #include "uprobe_multi_session_cookie.skel.h"
+#include "uprobe_multi_session_recursive.skel.h"
 #include "bpf/libbpf_internal.h"
 #include "testing_helpers.h"
 
@@ -27,6 +28,12 @@ noinline void uprobe_multi_func_3(void)
asm volatile ("");
 }
 
+noinline void uprobe_session_recursive(int i)
+{
+   if (i)
+   uprobe_session_recursive(i - 1);
+}
+
 struct child {
int go[2];
int pid;
@@ -587,6 +594,54 @@ static void test_session_cookie_skel_api(void)
uprobe_multi_session_cookie__destroy(skel);
 }
 
+static void test_session_recursive_skel_api(void)
+{
+   struct uprobe_multi_session_recursive *skel = NULL;
+   int i, err;
+
+   skel = uprobe_multi_session_recursive__open_and_load();
+   if (!ASSERT_OK_PTR(skel, 
"uprobe_multi_session_recursive__open_and_load"))
+   goto cleanup;
+
+   skel->bss->pid = getpid();
+
+   err = uprobe_multi_session_recursive__attach(skel);
+   if (!ASSERT_OK(err, "uprobe_multi_session_recursive__attach"))
+   goto cleanup;
+
+   for (i = 0; i < ARRAY_SIZE(skel->bss->test_uprobe_cookie_entry); i++)
+   skel->bss->test_uprobe_cookie_entry[i] = i + 1;
+
+   uprobe_session_recursive(5);
+
+   /*
+* entry uprobe:
+* uprobe_session_recursive(5) { *cookie = 1, return 0
+*   uprobe_session_recursive(4) {   *cookie = 2, return 1
+* uprobe_session_recursive(3) { *cookie = 3, return 0
+*   uprobe_session_recursive(2) {   *cookie = 4, return 1
+* uprobe_session_recursive(1) { *cookie = 5, return 0
+*   uprobe_session_recursive(0) {   *cookie = 6, return 1
+*  return uprobe:
+*   } i = 0  not executed
+* } i = 1
test_uprobe_cookie_return[0] = 5
+*   } i = 2  not executed
+* } i = 3
test_uprobe_cookie_return[1] = 3
+*   } i = 4  not executed
+* } i = 5
test_uprobe_cookie_return[2] = 1
+*/
+
+   ASSERT_EQ(skel->bss->idx_entry, 6, "idx_entry");
+   ASSERT_EQ(skel->bss->idx_return, 3, "idx_return");
+
+   ASSERT_EQ(skel->bss->test_uprobe_cookie_return[0], 5, 
"test_uprobe_cookie_return[0]");
+   ASSERT_EQ(skel->bss->test_uprobe_cookie_return[1], 3, 
"test_uprobe_cookie_return[1]");
+   ASSERT_EQ(skel->bss->test_uprobe_cookie_return[2], 1, 
"test_uprobe_cookie_return[2]");
+
+cleanup:
+   uprobe_multi_session_recursive__destroy(skel);
+}
+
 static void test_bench_attach_uprobe(void)
 {
long attach_start_ns = 0, attach_end_ns = 0;
@@ -681,4 +736,6 @@ void test_uprobe_multi_test(void)
test_session_error_multiple_instances();
if (test__start_subtest("session_cookie"))
test_session_cookie_skel_api();
+   if (test__start_subtest("session_cookie_recursive"))
+   test_session_recursive_skel_api();
 }
diff --git a/tools/testing/selftests/bpf/progs/uprobe_multi_session_recursive.c 
b/tools/testing/selftests/bpf/progs/uprobe_multi_session_recursive.c
new file mode 100644
index ..7babc180c28f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/uprobe_multi_session_recursive.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include "bpf_kfuncs.h"
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+int pid = 0;
+
+int idx_entry = 0;
+int idx_return = 0;
+
+__u64 test_uprobe_cookie_entry[6];
+__u64 test_uprobe_cookie_return[3];
+
+static int check_cookie(void)
+{
+   long *cookie = bpf_session_cookie();
+
+   if (bpf_session_is_return()) {
+   if (idx_return >= ARRAY_SIZE(test_uprobe_cookie_return))
+   return 1;
+   test_uprobe_cookie_return[idx_return++] = *cookie;
+   return 0;
+   }
+
+ 

[RFC bpf-next 09/10] selftests/bpf: Add uprobe session cookie test

2024-06-04 Thread Jiri Olsa
Adding uprobe session test that verifies the cookie value
get properly propagated from entry to return program.

Signed-off-by: Jiri Olsa 
---
 .../bpf/prog_tests/uprobe_multi_test.c| 31 
 .../bpf/progs/uprobe_multi_session_cookie.c   | 50 +++
 2 files changed, 81 insertions(+)
 create mode 100644 
tools/testing/selftests/bpf/progs/uprobe_multi_session_cookie.c

diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c 
b/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
index 4bff681f0d7d..34671253e130 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
@@ -6,6 +6,7 @@
 #include "uprobe_multi_bench.skel.h"
 #include "uprobe_multi_usdt.skel.h"
 #include "uprobe_multi_session.skel.h"
+#include "uprobe_multi_session_cookie.skel.h"
 #include "bpf/libbpf_internal.h"
 #include "testing_helpers.h"
 
@@ -558,6 +559,34 @@ static void test_session_error_multiple_instances(void)
uprobe_multi_session__destroy(skel_2);
 }
 
+static void test_session_cookie_skel_api(void)
+{
+   struct uprobe_multi_session_cookie *skel = NULL;
+   int err;
+
+   skel = uprobe_multi_session_cookie__open_and_load();
+   if (!ASSERT_OK_PTR(skel, "fentry_raw_skel_load"))
+   goto cleanup;
+
+   skel->bss->pid = getpid();
+
+   err = uprobe_multi_session_cookie__attach(skel);
+   if (!ASSERT_OK(err, " kprobe_multi_session__attach"))
+   goto cleanup;
+
+   /* trigger all probes */
+   uprobe_multi_func_1();
+   uprobe_multi_func_2();
+   uprobe_multi_func_3();
+
+   ASSERT_EQ(skel->bss->test_uprobe_1_result, 1, "test_uprobe_1_result");
+   ASSERT_EQ(skel->bss->test_uprobe_2_result, 2, "test_uprobe_2_result");
+   ASSERT_EQ(skel->bss->test_uprobe_3_result, 3, "test_uprobe_3_result");
+
+cleanup:
+   uprobe_multi_session_cookie__destroy(skel);
+}
+
 static void test_bench_attach_uprobe(void)
 {
long attach_start_ns = 0, attach_end_ns = 0;
@@ -650,4 +679,6 @@ void test_uprobe_multi_test(void)
test_session_skel_api();
if (test__start_subtest("session_errors"))
test_session_error_multiple_instances();
+   if (test__start_subtest("session_cookie"))
+   test_session_cookie_skel_api();
 }
diff --git a/tools/testing/selftests/bpf/progs/uprobe_multi_session_cookie.c 
b/tools/testing/selftests/bpf/progs/uprobe_multi_session_cookie.c
new file mode 100644
index ..ea41503fad18
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/uprobe_multi_session_cookie.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include "bpf_kfuncs.h"
+
+char _license[] SEC("license") = "GPL";
+
+int pid = 0;
+
+__u64 test_uprobe_1_result = 0;
+__u64 test_uprobe_2_result = 0;
+__u64 test_uprobe_3_result = 0;
+
+static int check_cookie(__u64 val, __u64 *result)
+{
+   long *cookie;
+
+   if (bpf_get_current_pid_tgid() >> 32 != pid)
+   return 1;
+
+   cookie = bpf_session_cookie();
+
+   if (bpf_session_is_return()) {
+   *result = *cookie == val ? val : 0;
+   } else {
+   *cookie = val;
+   }
+
+   return 0;
+}
+
+SEC("uprobe.session//proc/self/exe:uprobe_multi_func_1")
+int uprobe_1(struct pt_regs *ctx)
+{
+   return check_cookie(1, _uprobe_1_result);
+}
+
+SEC("uprobe.session//proc/self/exe:uprobe_multi_func_2")
+int uprobe_2(struct pt_regs *ctx)
+{
+   return check_cookie(2, _uprobe_2_result);
+}
+
+SEC("uprobe.session//proc/self/exe:uprobe_multi_func_3")
+int uprobe_3(struct pt_regs *ctx)
+{
+   return check_cookie(3, _uprobe_3_result);
+}
-- 
2.45.1




[RFC bpf-next 08/10] selftests/bpf: Add uprobe session errors test

2024-06-04 Thread Jiri Olsa
Adding uprobe session test to check that just single
session instance is allowed or single uprobe.

Signed-off-by: Jiri Olsa 
---
 .../bpf/prog_tests/uprobe_multi_test.c| 27 +++
 1 file changed, 27 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c 
b/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
index fddca2597818..4bff681f0d7d 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
@@ -533,6 +533,31 @@ static void test_session_skel_api(void)
uprobe_multi_session__destroy(skel);
 }
 
+static void test_session_error_multiple_instances(void)
+{
+   struct uprobe_multi_session *skel_1 = NULL, *skel_2 = NULL;
+   int err;
+
+   skel_1 = uprobe_multi_session__open_and_load();
+   if (!ASSERT_OK_PTR(skel_1, "fentry_raw_skel_load"))
+   goto cleanup;
+
+   err = uprobe_multi_session__attach(skel_1);
+   if (!ASSERT_OK(err, " kprobe_multi_session__attach"))
+   goto cleanup;
+
+   skel_2 = uprobe_multi_session__open_and_load();
+   if (!ASSERT_OK_PTR(skel_2, "fentry_raw_skel_load"))
+   goto cleanup;
+
+   err = uprobe_multi_session__attach(skel_2);
+   ASSERT_EQ(err, -EBUSY, " kprobe_multi_session__attach");
+
+cleanup:
+   uprobe_multi_session__destroy(skel_1);
+   uprobe_multi_session__destroy(skel_2);
+}
+
 static void test_bench_attach_uprobe(void)
 {
long attach_start_ns = 0, attach_end_ns = 0;
@@ -623,4 +648,6 @@ void test_uprobe_multi_test(void)
test_attach_api_fails();
if (test__start_subtest("session"))
test_session_skel_api();
+   if (test__start_subtest("session_errors"))
+   test_session_error_multiple_instances();
 }
-- 
2.45.1




[RFC bpf-next 07/10] selftests/bpf: Add uprobe session test

2024-06-04 Thread Jiri Olsa
Adding uprobe session test and testing that the entry program
return value controls execution of the return probe program.

Signed-off-by: Jiri Olsa 
---
 .../bpf/prog_tests/uprobe_multi_test.c| 38 ++
 .../bpf/progs/uprobe_multi_session.c  | 52 +++
 2 files changed, 90 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/uprobe_multi_session.c

diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c 
b/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
index 8269cdee33ae..fddca2597818 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
@@ -5,6 +5,7 @@
 #include "uprobe_multi.skel.h"
 #include "uprobe_multi_bench.skel.h"
 #include "uprobe_multi_usdt.skel.h"
+#include "uprobe_multi_session.skel.h"
 #include "bpf/libbpf_internal.h"
 #include "testing_helpers.h"
 
@@ -497,6 +498,41 @@ static void test_link_api(void)
__test_link_api(child);
 }
 
+static void test_session_skel_api(void)
+{
+   struct uprobe_multi_session *skel = NULL;
+   LIBBPF_OPTS(bpf_kprobe_multi_opts, opts);
+   struct bpf_link *link = NULL;
+   int err;
+
+   skel = uprobe_multi_session__open_and_load();
+   if (!ASSERT_OK_PTR(skel, "fentry_raw_skel_load"))
+   goto cleanup;
+
+   skel->bss->pid = getpid();
+
+   err = uprobe_multi_session__attach(skel);
+   if (!ASSERT_OK(err, " uprobe_multi_session__attach"))
+   goto cleanup;
+
+   /* trigger all probes */
+   skel->bss->uprobe_multi_func_1_addr = (__u64) uprobe_multi_func_1;
+   skel->bss->uprobe_multi_func_2_addr = (__u64) uprobe_multi_func_2;
+   skel->bss->uprobe_multi_func_3_addr = (__u64) uprobe_multi_func_3;
+
+   uprobe_multi_func_1();
+   uprobe_multi_func_2();
+   uprobe_multi_func_3();
+
+   ASSERT_EQ(skel->bss->uprobe_session_result[0], 1, 
"uprobe_multi_func_1_result");
+   ASSERT_EQ(skel->bss->uprobe_session_result[1], 2, 
"uprobe_multi_func_1_result");
+   ASSERT_EQ(skel->bss->uprobe_session_result[2], 1, 
"uprobe_multi_func_1_result");
+
+cleanup:
+   bpf_link__destroy(link);
+   uprobe_multi_session__destroy(skel);
+}
+
 static void test_bench_attach_uprobe(void)
 {
long attach_start_ns = 0, attach_end_ns = 0;
@@ -585,4 +621,6 @@ void test_uprobe_multi_test(void)
test_bench_attach_usdt();
if (test__start_subtest("attach_api_fails"))
test_attach_api_fails();
+   if (test__start_subtest("session"))
+   test_session_skel_api();
 }
diff --git a/tools/testing/selftests/bpf/progs/uprobe_multi_session.c 
b/tools/testing/selftests/bpf/progs/uprobe_multi_session.c
new file mode 100644
index ..b382d7d29475
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/uprobe_multi_session.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include "bpf_kfuncs.h"
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+__u64 uprobe_multi_func_1_addr = 0;
+__u64 uprobe_multi_func_2_addr = 0;
+__u64 uprobe_multi_func_3_addr = 0;
+
+__u64 uprobe_session_result[3];
+
+int pid = 0;
+
+static int uprobe_multi_check(void *ctx, bool is_return)
+{
+   const __u64 funcs[] = {
+   uprobe_multi_func_1_addr,
+   uprobe_multi_func_2_addr,
+   uprobe_multi_func_3_addr,
+   };
+   unsigned int i;
+   __u64 addr;
+
+   if (bpf_get_current_pid_tgid() >> 32 != pid)
+   return 1;
+
+   addr = bpf_get_func_ip(ctx);
+
+   for (i = 0; i < ARRAY_SIZE(funcs); i++) {
+   if (funcs[i] == addr) {
+   uprobe_session_result[i]++;
+   break;
+}
+}
+
+   if ((addr == uprobe_multi_func_1_addr) ||
+   (addr == uprobe_multi_func_3_addr))
+   return 1;
+
+   return 0;
+}
+
+SEC("uprobe.session//proc/self/exe:uprobe_multi_func_*")
+int uprobe(struct pt_regs *ctx)
+{
+   return uprobe_multi_check(ctx, bpf_session_is_return());
+}
-- 
2.45.1




[RFC bpf-next 06/10] selftests/bpf: Move ARRAY_SIZE to bpf_misc.h

2024-06-04 Thread Jiri Olsa
ARRAY_SIZE is used on multiple places, move its definition in
bpf_misc.h header.

Signed-off-by: Jiri Olsa 
---
 tools/testing/selftests/bpf/progs/bpf_misc.h | 2 ++
 tools/testing/selftests/bpf/progs/iters.c| 2 --
 tools/testing/selftests/bpf/progs/kprobe_multi_session.c | 3 +--
 tools/testing/selftests/bpf/progs/linked_list.c  | 5 +
 tools/testing/selftests/bpf/progs/netif_receive_skb.c| 5 +
 tools/testing/selftests/bpf/progs/profiler.inc.h | 5 +
 tools/testing/selftests/bpf/progs/setget_sockopt.c   | 5 +
 tools/testing/selftests/bpf/progs/test_bpf_ma.c  | 4 
 tools/testing/selftests/bpf/progs/test_sysctl_loop1.c| 5 +
 tools/testing/selftests/bpf/progs/test_sysctl_loop2.c| 5 +
 tools/testing/selftests/bpf/progs/test_sysctl_prog.c | 5 +
 .../testing/selftests/bpf/progs/test_tcp_custom_syncookie.c  | 1 +
 .../testing/selftests/bpf/progs/test_tcp_custom_syncookie.h  | 2 --
 .../testing/selftests/bpf/progs/verifier_subprog_precision.c | 2 --
 14 files changed, 11 insertions(+), 40 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h 
b/tools/testing/selftests/bpf/progs/bpf_misc.h
index fb2f5513e29e..70f56af94513 100644
--- a/tools/testing/selftests/bpf/progs/bpf_misc.h
+++ b/tools/testing/selftests/bpf/progs/bpf_misc.h
@@ -135,4 +135,6 @@
 /* make it look to compiler like value is read and written */
 #define __sink(expr) asm volatile("" : "+g"(expr))
 
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+
 #endif
diff --git a/tools/testing/selftests/bpf/progs/iters.c 
b/tools/testing/selftests/bpf/progs/iters.c
index fe65e0952a1e..16bdc3e25591 100644
--- a/tools/testing/selftests/bpf/progs/iters.c
+++ b/tools/testing/selftests/bpf/progs/iters.c
@@ -7,8 +7,6 @@
 #include "bpf_misc.h"
 #include "bpf_compiler.h"
 
-#define ARRAY_SIZE(x) (int)(sizeof(x) / sizeof((x)[0]))
-
 static volatile int zero = 0;
 
 int my_pid;
diff --git a/tools/testing/selftests/bpf/progs/kprobe_multi_session.c 
b/tools/testing/selftests/bpf/progs/kprobe_multi_session.c
index bbba9eb46551..bd8b7fb7061e 100644
--- a/tools/testing/selftests/bpf/progs/kprobe_multi_session.c
+++ b/tools/testing/selftests/bpf/progs/kprobe_multi_session.c
@@ -4,8 +4,7 @@
 #include 
 #include 
 #include "bpf_kfuncs.h"
-
-#define ARRAY_SIZE(x) (int)(sizeof(x) / sizeof((x)[0]))
+#include "bpf_misc.h"
 
 char _license[] SEC("license") = "GPL";
 
diff --git a/tools/testing/selftests/bpf/progs/linked_list.c 
b/tools/testing/selftests/bpf/progs/linked_list.c
index f69bf3e30321..421f40835acd 100644
--- a/tools/testing/selftests/bpf/progs/linked_list.c
+++ b/tools/testing/selftests/bpf/progs/linked_list.c
@@ -4,10 +4,7 @@
 #include 
 #include 
 #include "bpf_experimental.h"
-
-#ifndef ARRAY_SIZE
-#define ARRAY_SIZE(x) (int)(sizeof(x) / sizeof((x)[0]))
-#endif
+#include "bpf_misc.h"
 
 #include "linked_list.h"
 
diff --git a/tools/testing/selftests/bpf/progs/netif_receive_skb.c 
b/tools/testing/selftests/bpf/progs/netif_receive_skb.c
index c0062645fc68..9e067dcbf607 100644
--- a/tools/testing/selftests/bpf/progs/netif_receive_skb.c
+++ b/tools/testing/selftests/bpf/progs/netif_receive_skb.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include "bpf_misc.h"
 
 #include 
 
@@ -23,10 +24,6 @@ bool skip = false;
 #define BADPTR 0
 #endif
 
-#ifndef ARRAY_SIZE
-#define ARRAY_SIZE(x)  (sizeof(x) / sizeof((x)[0]))
-#endif
-
 struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__uint(max_entries, 1);
diff --git a/tools/testing/selftests/bpf/progs/profiler.inc.h 
b/tools/testing/selftests/bpf/progs/profiler.inc.h
index 6957d9f2805e..8bd1ebd7d6af 100644
--- a/tools/testing/selftests/bpf/progs/profiler.inc.h
+++ b/tools/testing/selftests/bpf/progs/profiler.inc.h
@@ -9,6 +9,7 @@
 #include "err.h"
 #include "bpf_experimental.h"
 #include "bpf_compiler.h"
+#include "bpf_misc.h"
 
 #ifndef NULL
 #define NULL 0
@@ -133,10 +134,6 @@ struct {
__uint(max_entries, 16);
 } disallowed_exec_inodes SEC(".maps");
 
-#ifndef ARRAY_SIZE
-#define ARRAY_SIZE(arr) (int)(sizeof(arr) / sizeof(arr[0]))
-#endif
-
 static INLINE bool IS_ERR(const void* ptr)
 {
return IS_ERR_VALUE((unsigned long)ptr);
diff --git a/tools/testing/selftests/bpf/progs/setget_sockopt.c 
b/tools/testing/selftests/bpf/progs/setget_sockopt.c
index 7a438600ae98..60518aed1ffc 100644
--- a/tools/testing/selftests/bpf/progs/setget_sockopt.c
+++ b/tools/testing/selftests/bpf/progs/setget_sockopt.c
@@ -6,10 +6,7 @@
 #include 
 #include 
 #include 
-
-#ifndef ARRAY_SIZE
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
-#endif
+#include "bpf_misc.h"
 
 extern unsigned long CONFIG_HZ __kconfig;
 
diff --git a/tools/testing/selftests/bpf/progs/test_bpf_ma.c 
b/tools/testing/selftests/bpf/progs/test_bpf_ma.c
index 3494ca30fa7f..4a4e0b8d9b72 100644
--- a/tools/testing/selftests/bpf/progs/test_bpf_ma.c
+++ 

[RFC bpf-next 05/10] libbpf: Add uprobe session attach type names to attach_type_name

2024-06-04 Thread Jiri Olsa
Adding uprobe session attach type name to attach_type_name,
so libbpf_bpf_attach_type_str returns proper string name for
BPF_TRACE_UPROBE_SESSION attach type.

Signed-off-by: Jiri Olsa 
---
 tools/lib/bpf/libbpf.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index a008a708..702c2fb7e4df 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -133,6 +133,7 @@ static const char * const attach_type_name[] = {
[BPF_NETKIT_PRIMARY]= "netkit_primary",
[BPF_NETKIT_PEER]   = "netkit_peer",
[BPF_TRACE_KPROBE_SESSION]  = "trace_kprobe_session",
+   [BPF_TRACE_UPROBE_SESSION]  = "trace_uprobe_session",
 };
 
 static const char * const link_type_name[] = {
-- 
2.45.1




[RFC bpf-next 04/10] libbpf: Add support for uprobe multi session attach

2024-06-04 Thread Jiri Olsa
Adding support to attach program in uprobe session mode
with bpf_program__attach_uprobe_multi function.

Adding session bool to bpf_uprobe_multi_opts struct that allows
to load and attach the bpf program via uprobe session.
the attachment to create uprobe multi session.

Also adding new program loader section that allows:
SEC("uprobe.session/bpf_fentry_test*")

and loads/attaches uprobe program as uprobe session.

Signed-off-by: Jiri Olsa 
---
 tools/lib/bpf/bpf.c|  1 +
 tools/lib/bpf/libbpf.c | 50 --
 tools/lib/bpf/libbpf.h |  4 +++-
 3 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 2a4c71501a17..becdfa701c75 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -776,6 +776,7 @@ int bpf_link_create(int prog_fd, int target_fd,
return libbpf_err(-EINVAL);
break;
case BPF_TRACE_UPROBE_MULTI:
+   case BPF_TRACE_UPROBE_SESSION:
attr.link_create.uprobe_multi.flags = OPTS_GET(opts, 
uprobe_multi.flags, 0);
attr.link_create.uprobe_multi.cnt = OPTS_GET(opts, 
uprobe_multi.cnt, 0);
attr.link_create.uprobe_multi.path = ptr_to_u64(OPTS_GET(opts, 
uprobe_multi.path, 0));
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index d1627a2ca30b..a008a708 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -9328,6 +9328,7 @@ static int attach_trace(const struct bpf_program *prog, 
long cookie, struct bpf_
 static int attach_kprobe_multi(const struct bpf_program *prog, long cookie, 
struct bpf_link **link);
 static int attach_kprobe_session(const struct bpf_program *prog, long cookie, 
struct bpf_link **link);
 static int attach_uprobe_multi(const struct bpf_program *prog, long cookie, 
struct bpf_link **link);
+static int attach_uprobe_session(const struct bpf_program *prog, long cookie, 
struct bpf_link **link);
 static int attach_lsm(const struct bpf_program *prog, long cookie, struct 
bpf_link **link);
 static int attach_iter(const struct bpf_program *prog, long cookie, struct 
bpf_link **link);
 
@@ -9346,6 +9347,7 @@ static const struct bpf_sec_def section_defs[] = {
SEC_DEF("kprobe.session+",  KPROBE, BPF_TRACE_KPROBE_SESSION, 
SEC_NONE, attach_kprobe_session),
SEC_DEF("uprobe.multi+",KPROBE, BPF_TRACE_UPROBE_MULTI, 
SEC_NONE, attach_uprobe_multi),
SEC_DEF("uretprobe.multi+", KPROBE, BPF_TRACE_UPROBE_MULTI, 
SEC_NONE, attach_uprobe_multi),
+   SEC_DEF("uprobe.session+",  KPROBE, BPF_TRACE_UPROBE_SESSION, 
SEC_NONE, attach_uprobe_session),
SEC_DEF("uprobe.multi.s+",  KPROBE, BPF_TRACE_UPROBE_MULTI, 
SEC_SLEEPABLE, attach_uprobe_multi),
SEC_DEF("uretprobe.multi.s+",   KPROBE, BPF_TRACE_UPROBE_MULTI, 
SEC_SLEEPABLE, attach_uprobe_multi),
SEC_DEF("ksyscall+",KPROBE, 0, SEC_NONE, attach_ksyscall),
@@ -11682,6 +11684,40 @@ static int attach_uprobe_multi(const struct 
bpf_program *prog, long cookie, stru
return ret;
 }
 
+static int attach_uprobe_session(const struct bpf_program *prog, long cookie, 
struct bpf_link **link)
+{
+   char *binary_path = NULL, *func_name = NULL;
+   LIBBPF_OPTS(bpf_uprobe_multi_opts, opts,
+   .session = true,
+   );
+   int n, ret = -EINVAL;
+   const char *spec;
+
+   *link = NULL;
+
+   spec = prog->sec_name + sizeof("uprobe.session/") - 1;
+   n = sscanf(spec, "%m[^:]:%m[^\n]",
+  _path, _name);
+
+   switch (n) {
+   case 1:
+   /* but auto-attach is impossible. */
+   ret = 0;
+   break;
+   case 2:
+   *link = bpf_program__attach_uprobe_multi(prog, -1, binary_path, 
func_name, );
+   ret = *link ? 0 : -errno;
+   break;
+   default:
+   pr_warn("prog '%s': invalid format of section definition 
'%s'\n", prog->name,
+   prog->sec_name);
+   break;
+   }
+   free(binary_path);
+   free(func_name);
+   return ret;
+}
+
 static void gen_uprobe_legacy_event_name(char *buf, size_t buf_sz,
 const char *binary_path, uint64_t 
offset)
 {
@@ -11916,10 +11952,12 @@ bpf_program__attach_uprobe_multi(const struct 
bpf_program *prog,
const unsigned long *ref_ctr_offsets = NULL, *offsets = NULL;
LIBBPF_OPTS(bpf_link_create_opts, lopts);
unsigned long *resolved_offsets = NULL;
+   enum bpf_attach_type attach_type;
int err = 0, link_fd, prog_fd;
struct bpf_link *link = NULL;
char errmsg[STRERR_BUFSIZE];
char full_path[PATH_MAX];
+   bool retprobe, session;
const __u64 *cookies;
const char **syms;
size_t cnt;
@@ -11990,12 +12028,20 @@ bpf_program__attach_uprobe_multi(const struct 
bpf_program *prog,
offsets = resolved_offsets;
  

[RFC bpf-next 03/10] bpf: Add support for uprobe multi session context

2024-06-04 Thread Jiri Olsa
Placing bpf_session_run_ctx layer in between bpf_run_ctx and
bpf_uprobe_multi_run_ctx, so the session data can be retrieved
from uprobe_multi link.

Plus granting session kfuncs access to uprobe session programs.

Signed-off-by: Jiri Olsa 
---
 kernel/trace/bpf_trace.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 53b111c8e887..4392807ee8d9 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -3181,7 +3181,7 @@ struct bpf_uprobe_multi_link {
 };
 
 struct bpf_uprobe_multi_run_ctx {
-   struct bpf_run_ctx run_ctx;
+   struct bpf_session_run_ctx session_ctx;
unsigned long entry_ip;
struct bpf_uprobe *uprobe;
 };
@@ -3294,10 +3294,15 @@ static const struct bpf_link_ops 
bpf_uprobe_multi_link_lops = {
 
 static int uprobe_prog_run(struct bpf_uprobe *uprobe,
   unsigned long entry_ip,
-  struct pt_regs *regs)
+  struct pt_regs *regs,
+  bool is_return, void *data)
 {
struct bpf_uprobe_multi_link *link = uprobe->link;
struct bpf_uprobe_multi_run_ctx run_ctx = {
+   .session_ctx = {
+   .is_return = is_return,
+   .data = data,
+   },
.entry_ip = entry_ip,
.uprobe = uprobe,
};
@@ -3316,7 +3321,7 @@ static int uprobe_prog_run(struct bpf_uprobe *uprobe,
 
migrate_disable();
 
-   old_run_ctx = bpf_set_run_ctx(_ctx.run_ctx);
+   old_run_ctx = bpf_set_run_ctx(_ctx.session_ctx.run_ctx);
err = bpf_prog_run(link->link.prog, regs);
bpf_reset_run_ctx(old_run_ctx);
 
@@ -3345,7 +3350,7 @@ uprobe_multi_link_handler(struct uprobe_consumer *con, 
struct pt_regs *regs)
struct bpf_uprobe *uprobe;
 
uprobe = container_of(con, struct bpf_uprobe, consumer);
-   return uprobe_prog_run(uprobe, instruction_pointer(regs), regs);
+   return uprobe_prog_run(uprobe, instruction_pointer(regs), regs, false, 
NULL);
 }
 
 static int
@@ -3354,7 +3359,7 @@ uprobe_multi_link_ret_handler(struct uprobe_consumer 
*con, unsigned long func, s
struct bpf_uprobe *uprobe;
 
uprobe = container_of(con, struct bpf_uprobe, consumer);
-   return uprobe_prog_run(uprobe, func, regs);
+   return uprobe_prog_run(uprobe, func, regs, true, NULL);
 }
 
 static int
@@ -3364,7 +3369,7 @@ uprobe_multi_link_handler_session(struct uprobe_consumer 
*con, struct pt_regs *r
struct bpf_uprobe *uprobe;
 
uprobe = container_of(con, struct bpf_uprobe, consumer);
-   return uprobe_prog_run(uprobe, instruction_pointer(regs), regs);
+   return uprobe_prog_run(uprobe, instruction_pointer(regs), regs, false, 
data);
 }
 
 static int
@@ -3374,14 +3379,14 @@ uprobe_multi_link_ret_handler_session(struct 
uprobe_consumer *con, unsigned long
struct bpf_uprobe *uprobe;
 
uprobe = container_of(con, struct bpf_uprobe, consumer);
-   return uprobe_prog_run(uprobe, func, regs);
+   return uprobe_prog_run(uprobe, func, regs, true, data);
 }
 
 static u64 bpf_uprobe_multi_entry_ip(struct bpf_run_ctx *ctx)
 {
struct bpf_uprobe_multi_run_ctx *run_ctx;
 
-   run_ctx = container_of(current->bpf_ctx, struct 
bpf_uprobe_multi_run_ctx, run_ctx);
+   run_ctx = container_of(current->bpf_ctx, struct 
bpf_uprobe_multi_run_ctx, session_ctx.run_ctx);
return run_ctx->entry_ip;
 }
 
@@ -3389,7 +3394,7 @@ static u64 bpf_uprobe_multi_cookie(struct bpf_run_ctx 
*ctx)
 {
struct bpf_uprobe_multi_run_ctx *run_ctx;
 
-   run_ctx = container_of(current->bpf_ctx, struct 
bpf_uprobe_multi_run_ctx, run_ctx);
+   run_ctx = container_of(current->bpf_ctx, struct 
bpf_uprobe_multi_run_ctx, session_ctx.run_ctx);
return run_ctx->uprobe->cookie;
 }
 
@@ -3586,7 +3591,8 @@ static int bpf_kprobe_multi_filter(const struct bpf_prog 
*prog, u32 kfunc_id)
if (!btf_id_set8_contains(_multi_kfunc_set_ids, kfunc_id))
return 0;
 
-   if (!is_kprobe_session(prog))
+   if (!is_kprobe_session(prog) &&
+   !is_uprobe_session(prog))
return -EACCES;
 
return 0;
-- 
2.45.1




[RFC bpf-next 02/10] bpf: Add support for uprobe multi session attach

2024-06-04 Thread Jiri Olsa
Adding support to attach bpf program for entry and return probe
of the same function. This is common use case which at the moment
requires to create two uprobe multi links.

Adding new BPF_TRACE_UPROBE_SESSION attach type that instructs
kernel to attach single link program to both entry and exit probe.

It's possible to control execution of the bpf program on return
probe simply by returning zero or non zero from the entry bpf
program execution to execute or not the bpf program on return
probe respectively.

Signed-off-by: Jiri Olsa 
---
 include/uapi/linux/bpf.h   |  1 +
 kernel/bpf/syscall.c   |  9 --
 kernel/trace/bpf_trace.c   | 50 +-
 tools/include/uapi/linux/bpf.h |  1 +
 4 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 25ea393cf084..b400f50e2c3c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1116,6 +1116,7 @@ enum bpf_attach_type {
BPF_NETKIT_PRIMARY,
BPF_NETKIT_PEER,
BPF_TRACE_KPROBE_SESSION,
+   BPF_TRACE_UPROBE_SESSION,
__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5070fa20d05c..71d279907a0c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4048,10 +4048,14 @@ static int bpf_prog_attach_check_attach_type(const 
struct bpf_prog *prog,
if (prog->expected_attach_type == BPF_TRACE_UPROBE_MULTI &&
attach_type != BPF_TRACE_UPROBE_MULTI)
return -EINVAL;
+   if (prog->expected_attach_type == BPF_TRACE_UPROBE_SESSION &&
+   attach_type != BPF_TRACE_UPROBE_SESSION)
+   return -EINVAL;
if (attach_type != BPF_PERF_EVENT &&
attach_type != BPF_TRACE_KPROBE_MULTI &&
attach_type != BPF_TRACE_KPROBE_SESSION &&
-   attach_type != BPF_TRACE_UPROBE_MULTI)
+   attach_type != BPF_TRACE_UPROBE_MULTI &&
+   attach_type != BPF_TRACE_UPROBE_SESSION)
return -EINVAL;
return 0;
case BPF_PROG_TYPE_SCHED_CLS:
@@ -5314,7 +5318,8 @@ static int link_create(union bpf_attr *attr, bpfptr_t 
uattr)
else if (attr->link_create.attach_type == 
BPF_TRACE_KPROBE_MULTI ||
 attr->link_create.attach_type == 
BPF_TRACE_KPROBE_SESSION)
ret = bpf_kprobe_multi_link_attach(attr, prog);
-   else if (attr->link_create.attach_type == 
BPF_TRACE_UPROBE_MULTI)
+   else if (attr->link_create.attach_type == 
BPF_TRACE_UPROBE_MULTI ||
+attr->link_create.attach_type == 
BPF_TRACE_UPROBE_SESSION)
ret = bpf_uprobe_multi_link_attach(attr, prog);
break;
default:
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index f5154c051d2c..53b111c8e887 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1642,6 +1642,17 @@ static inline bool is_kprobe_session(const struct 
bpf_prog *prog)
return prog->expected_attach_type == BPF_TRACE_KPROBE_SESSION;
 }
 
+static inline bool is_uprobe_multi(const struct bpf_prog *prog)
+{
+   return prog->expected_attach_type == BPF_TRACE_UPROBE_MULTI ||
+  prog->expected_attach_type == BPF_TRACE_UPROBE_SESSION;
+}
+
+static inline bool is_uprobe_session(const struct bpf_prog *prog)
+{
+   return prog->expected_attach_type == BPF_TRACE_UPROBE_SESSION;
+}
+
 static const struct bpf_func_proto *
 kprobe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1659,13 +1670,13 @@ kprobe_prog_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
case BPF_FUNC_get_func_ip:
if (is_kprobe_multi(prog))
return _get_func_ip_proto_kprobe_multi;
-   if (prog->expected_attach_type == BPF_TRACE_UPROBE_MULTI)
+   if (is_uprobe_multi(prog))
return _get_func_ip_proto_uprobe_multi;
return _get_func_ip_proto_kprobe;
case BPF_FUNC_get_attach_cookie:
if (is_kprobe_multi(prog))
return _get_attach_cookie_proto_kmulti;
-   if (prog->expected_attach_type == BPF_TRACE_UPROBE_MULTI)
+   if (is_uprobe_multi(prog))
return _get_attach_cookie_proto_umulti;
return _get_attach_cookie_proto_trace;
default:
@@ -3346,6 +3357,26 @@ uprobe_multi_link_ret_handler(struct uprobe_consumer 
*con, unsigned long func, s
return uprobe_prog_run(uprobe, func, regs);
 }
 
+static int
+uprobe_multi_link_handler_session(struct uprobe_consumer *con, struct pt_regs 
*regs,
+ unsigned long *data)
+{
+   struct bpf_uprobe *uprobe;
+
+   uprobe = container_of(con, 

[RFC bpf-next 01/10] uprobe: Add session callbacks to uprobe_consumer

2024-06-04 Thread Jiri Olsa
Adding new set of callbacks that are triggered on entry and return
uprobe execution for the attached function.

The session means that those callbacks are 'connected' in a way
that allows to:
  - control execution of return callback from entry callback
  - share data between entry and return callbacks

The session concept fits to our common use case where we do filtering
on entry uprobe and based on the result we decide to run the return
uprobe (or not).

It's also convenient to share the data between session callbacks.

The control of return uprobe execution is done via return value of the
entry session callback, where 0 means to install and execute return
uprobe, 1 means to not install.

Current implementation has a restriction that allows to register only
single consumer with session callbacks for a uprobe and also restricting
standard callbacks consumers.

Which means that there can be only single user of a uprobe (inode +
offset) when session consumer is registered to it.

This is because all registered consumers are executed when uprobe or
return uprobe is hit and wihout additional layer (like fgraph's shadow
stack) that would keep the state of the return callback, we have no
way to find out which consumer should be executed.

I'm not sure how big limitation this is for people, our current use
case seems to be ok with that. Fixing this would be more complex/bigger
change to uprobes, thoughts?

Hence sending this as RFC to gather more opinions and feedback.

Signed-off-by: Jiri Olsa 
---
 include/linux/uprobes.h | 18 +++
 kernel/events/uprobes.c | 69 +++--
 2 files changed, 78 insertions(+), 9 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index f46e0ca0169c..a2f2d5ac3cee 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -34,6 +34,12 @@ enum uprobe_filter_ctx {
 };
 
 struct uprobe_consumer {
+   /*
+* The handler callback return value controls removal of the uprobe.
+*  0 on success, uprobe stays
+*  1 on failure, remove the uprobe
+*console warning for anything else
+*/
int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs);
int (*ret_handler)(struct uprobe_consumer *self,
unsigned long func,
@@ -42,6 +48,17 @@ struct uprobe_consumer {
enum uprobe_filter_ctx ctx,
struct mm_struct *mm);
 
+   /* The handler_session callback return value controls execution of
+* the return uprobe and ret_handler_session callback.
+*  0 on success
+*  1 on failure, DO NOT install/execute the return uprobe
+*console warning for anything else
+*/
+   int (*handler_session)(struct uprobe_consumer *self, struct pt_regs 
*regs,
+  unsigned long *data);
+   int (*ret_handler_session)(struct uprobe_consumer *self, unsigned long 
func,
+  struct pt_regs *regs, unsigned long *data);
+
struct uprobe_consumer *next;
 };
 
@@ -85,6 +102,7 @@ struct return_instance {
unsigned long   func;
unsigned long   stack;  /* stack pointer */
unsigned long   orig_ret_vaddr; /* original return address */
+   unsigned long   data;
boolchained;/* true, if instance is nested 
*/
 
struct return_instance  *next;  /* keep as stack */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 2c83ba776fc7..17b0771272a6 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -750,12 +750,32 @@ static struct uprobe *alloc_uprobe(struct inode *inode, 
loff_t offset,
return uprobe;
 }
 
-static void consumer_add(struct uprobe *uprobe, struct uprobe_consumer *uc)
+/*
+ * Make sure all the uprobe consumers have only one type of entry
+ * callback registered (either handler or handler_session) due to
+ * different return value actions.
+ */
+static int consumer_check(struct uprobe_consumer *curr, struct uprobe_consumer 
*uc)
+{
+   if (!curr)
+   return 0;
+   if (curr->handler_session || uc->handler_session)
+   return -EBUSY;
+   return 0;
+}
+
+static int consumer_add(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
+   int err;
+
down_write(>consumer_rwsem);
-   uc->next = uprobe->consumers;
-   uprobe->consumers = uc;
+   err = consumer_check(uprobe->consumers, uc);
+   if (!err) {
+   uc->next = uprobe->consumers;
+   uprobe->consumers = uc;
+   }
up_write(>consumer_rwsem);
+   return err;
 }
 
 /*
@@ -1114,6 +1134,21 @@ void uprobe_unregister(struct inode *inode, loff_t 
offset, struct uprobe_consume
 }
 EXPORT_SYMBOL_GPL(uprobe_unregister);
 
+static int check_handler(struct uprobe_consumer *uc)
+{
+   /* 

[RFC bpf-next 00/10] uprobe, bpf: Add session support

2024-06-04 Thread Jiri Olsa
hi,
this patchset is adding support for session uprobe attachment
and using it through bpf link for bpf programs.

The session means that the uprobe consumer is executed on entry
and return of probed function with additional control:
  - entry callback can control execution of the return callback
  - entry and return callbacks can share data/cookie

On more details please see patch #1.

thanks,
jirka


---
Jiri Olsa (10):
  uprobe: Add session callbacks to uprobe_consumer
  bpf: Add support for uprobe multi session attach
  bpf: Add support for uprobe multi session context
  libbpf: Add support for uprobe multi session attach
  libbpf: Add uprobe session attach type names to attach_type_name
  selftests/bpf: Move ARRAY_SIZE to bpf_misc.h
  selftests/bpf: Add uprobe session test
  selftests/bpf: Add uprobe session errors test
  selftests/bpf: Add uprobe session cookie test
  selftests/bpf: Add uprobe session recursive test

 include/linux/uprobes.h|  18 
++
 include/uapi/linux/bpf.h   |   1 +
 kernel/bpf/syscall.c   |   9 +++--
 kernel/events/uprobes.c|  69 
+-
 kernel/trace/bpf_trace.c   |  72 
+++-
 tools/include/uapi/linux/bpf.h |   1 +
 tools/lib/bpf/bpf.c|   1 +
 tools/lib/bpf/libbpf.c |  51 
++--
 tools/lib/bpf/libbpf.h |   4 ++-
 tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c | 153 

 tools/testing/selftests/bpf/progs/bpf_misc.h   |   2 ++
 tools/testing/selftests/bpf/progs/iters.c  |   2 --
 tools/testing/selftests/bpf/progs/kprobe_multi_session.c   |   3 +-
 tools/testing/selftests/bpf/progs/linked_list.c|   5 +--
 tools/testing/selftests/bpf/progs/netif_receive_skb.c  |   5 +--
 tools/testing/selftests/bpf/progs/profiler.inc.h   |   5 +--
 tools/testing/selftests/bpf/progs/setget_sockopt.c |   5 +--
 tools/testing/selftests/bpf/progs/test_bpf_ma.c|   4 ---
 tools/testing/selftests/bpf/progs/test_sysctl_loop1.c  |   5 +--
 tools/testing/selftests/bpf/progs/test_sysctl_loop2.c  |   5 +--
 tools/testing/selftests/bpf/progs/test_sysctl_prog.c   |   5 +--
 tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.c  |   1 +
 tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.h  |   2 --
 tools/testing/selftests/bpf/progs/uprobe_multi_session.c   |  52 
+
 tools/testing/selftests/bpf/progs/uprobe_multi_session_cookie.c|  50 

 tools/testing/selftests/bpf/progs/uprobe_multi_session_recursive.c |  44 

 tools/testing/selftests/bpf/progs/verifier_subprog_precision.c |   2 --
 27 files changed, 507 insertions(+), 69 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/uprobe_multi_session.c
 create mode 100644 
tools/testing/selftests/bpf/progs/uprobe_multi_session_cookie.c
 create mode 100644 
tools/testing/selftests/bpf/progs/uprobe_multi_session_recursive.c



[PATCH] ftrace/selftests: Fix pid test with function graph not showing pids

2024-06-04 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

The pid filtering test will set the pid filters and make sure that both
function and function_graph tracing honors the filters. But the
function_graph tracer test was failing because the PID was not being
filtered properly. That's because the funcgraph-proc option wasn't getting
set. Without that option the PID is not shown.

Instead we get:

+ cat trace
# tracer: function_graph
#
# CPU  DURATION  FUNCTION CALLS
# | |   | |   |   |   |
 3) ! 143.685 us  |  kernel_clone();
 3) ! 127.055 us  |  kernel_clone();
 1) ! 127.170 us  |  kernel_clone();
 3) ! 126.840 us  |  kernel_clone();

When we should be getting:

+ cat trace
# tracer: function_graph
#
# CPU  TASK/PID DURATION  FUNCTION CALLS
# | ||   |   | |   |   |   |
 4)bash-939| # 1070.009 us |  kernel_clone();
 4)bash-939| # 1116.903 us |  kernel_clone();
 5)bash-939| ! 976.133 us  |  kernel_clone();
 5)bash-939| ! 954.012 us  |  kernel_clone();

The test looks for the pids it is filtering and will fail if it can not
find them. Without fungraph-proc option set, it will not be displayed and
the test will fail.

Link: https://lore.kernel.org/all/Zl9JFnzKGuUM10X2@J2N7QTR9R3/

Fixes: 35b944a997e2 ("selftests/ftrace: Add function_graph tracer to 
func-filter-pid test")
Reported-by: Mark Rutland 
Signed-off-by: Steven Rostedt (Google) 
---
 tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc 
b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc
index c6fc9d31a496..8dcce001881d 100644
--- a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc
+++ b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc
@@ -8,6 +8,7 @@
 # Also test it on an instance directory
 
 do_function_fork=1
+do_funcgraph_proc=1
 
 if [ ! -f options/function-fork ]; then
 do_function_fork=0
@@ -28,6 +29,7 @@ fi
 
 if [ $do_funcgraph_proc -eq 1 ]; then
 orig_value2=`cat options/funcgraph-proc`
+echo 1 > options/funcgraph-proc
 fi
 
 do_reset() {
-- 
2.43.0




Re: [PATCH v3 00/27] function_graph: Allow multiple users for function graph tracing

2024-06-04 Thread Steven Rostedt
On Tue, 4 Jun 2024 14:57:42 -0400
Steven Rostedt  wrote:

> Bah, I just ran the test.d/ftrace/func-filter-pid.tc and it fails too. This
> did pass my other tests that do run ftracetests. Hmm, I just ran it on my
> test box that does the tests and it passes there. I wonder if there's some
> config option that makes it fail :-/
> 
> Well, now that I see it fail, I can investigate.

Ah, figured it out. The updated pid test isn't working. This explains why
my test machine didn't fail, as it doesn't have the updated ftracetests.

The problem was that I never set the funcgraph-proc option, even though I
saved it :-p

That is, it shows:

>   + cat trace
>   # tracer: function_graph
>   #
>   # CPU  DURATION  FUNCTION CALLS
>   # | |   | |   |   |   |
>3) ! 143.685 us  |  kernel_clone();
>3) ! 127.055 us  |  kernel_clone();
>1) ! 127.170 us  |  kernel_clone();
>3) ! 126.840 us  |  kernel_clone();

But when you do: echo 1 > options/funcgraph-proc

You get:

# cat trace
# tracer: function_graph
#
# CPU  TASK/PID DURATION  FUNCTION CALLS
# | ||   |   | |   |   |   |
 4)bash-939| # 1070.009 us |  kernel_clone();
 4)bash-939| # 1116.903 us |  kernel_clone();
 5)bash-939| ! 976.133 us  |  kernel_clone();
 5)bash-939| ! 954.012 us  |  kernel_clone();
 5)bash-939| ! 905.825 us  |  kernel_clone();
 5)bash-939| # 1130.922 us |  kernel_clone();
 7)bash-939| # 1097.648 us |  kernel_clone();
 0)bash-939| # 1008.000 us |  kernel_clone();
 3)bash-939| # 1023.391 us |  kernel_clone();
 4)bash-939| # 1033.008 us |  kernel_clone();
 4)bash-939| ! 949.072 us  |  kernel_clone();
 4)bash-939| # 1027.990 us |  kernel_clone();
 4)bash-939| ! 954.678 us  |  kernel_clone();
 4)bash-939| ! 996.557 us  |  kernel_clone();

Without that option, function graph does no show what process is being
recorded (except at sched switch)

Can you add this patch to the test and see if it works again?

-- Steve

diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc 
b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc
index c6fc9d31a496..8dcce001881d 100644
--- a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc
+++ b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc
@@ -8,6 +8,7 @@
 # Also test it on an instance directory
 
 do_function_fork=1
+do_funcgraph_proc=1
 
 if [ ! -f options/function-fork ]; then
 do_function_fork=0
@@ -28,6 +29,7 @@ fi
 
 if [ $do_funcgraph_proc -eq 1 ]; then
 orig_value2=`cat options/funcgraph-proc`
+echo 1 > options/funcgraph-proc
 fi
 
 do_reset() {



Re: [PATCH v3 00/27] function_graph: Allow multiple users for function graph tracing

2024-06-04 Thread Steven Rostedt
On Tue, 4 Jun 2024 18:04:22 +0100
Mark Rutland  wrote:

> > There may have been something arch specific that I'm unaware about. I'll
> > look at that deeper.  
> 
> It looks like e are lines in the trace that it doesn't expect:
> 
>   + cat trace
>   + grep -v ^#
>   + grep 970
>   + wc -l
>   + count_pid=0
>   + cat trace
>   + grep -v ^#
>   + grep -v 970
>   + wc -l
>   + count_other=3
>   + [ 0 -eq 0 -o 3 -ne 0 ]
>   + fail PID filtering not working?
> 
> ... where we expect that count_other to be 0.
> 
> I hacked in a 'cat trace' just before the 'fail' and that shows:
> 
>   + cat trace
>   # tracer: function_graph
>   #
>   # CPU  DURATION  FUNCTION CALLS
>   # | |   | |   |   |   |
>3) ! 143.685 us  |  kernel_clone();
>3) ! 127.055 us  |  kernel_clone();
>1) ! 127.170 us  |  kernel_clone();
>3) ! 126.840 us  |  kernel_clone();
> 
> I'm not sure if that's legitimate output the test is failing to account
> for or if that indicates a kernel-side issue.

Bah, I just ran the test.d/ftrace/func-filter-pid.tc and it fails too. This
did pass my other tests that do run ftracetests. Hmm, I just ran it on my
test box that does the tests and it passes there. I wonder if there's some
config option that makes it fail :-/

Well, now that I see it fail, I can investigate.

-- Steve



Re: [PATCH v2 2/3] tracing/fprobe: Support raw tracepoint events on modules

2024-06-04 Thread Mathieu Desnoyers

On 2024-06-04 12:34, Steven Rostedt wrote:

On Tue, 4 Jun 2024 11:02:16 -0400
Mathieu Desnoyers  wrote:


I see.

It looks like there are a few things we could improve there:

1) With your approach, modules need to be already loaded before
attaching an fprobe event to them. This effectively prevents
attaching to any module init code. Is there any way we could allow
this by implementing a module coming notifier in fprobe as well ?
This would require that fprobes are kept around in a data structure
that matches the modules when they are loaded in the coming notifier.


The above sounds like a nice enhancement, but not something necessary for
this series.


IMHO it is nevertheless relevant to discuss the impact of supporting
this kind of use-case on the ABI presented to userspace, at least to
validate that what is exposed today can incrementally be enhanced
towards that goal.

I'm not saying that it needs to be implemented today, but we should
at least give it some thoughts right now to make sure the ABI is a
good fit.



2) Given that the fprobe module going notifier is protected by the
event_mutex, can we use locking rather than reference counting
in fprobe attach to guarantee the target module is not reclaimed
concurrently ? This would remove the transient side-effect of
holding a module reference count which temporarily prevents module
unload.


Why do we care about unloading modules during the transition? Note, module
unload has always been considered a second class citizen, and there's been
talks in the past to even rip it out.


As a general rule I try to ensure tracing has as little impact on the
system behavior so issues that occur without tracing can be reproduced
with instrumentation.

On systems where modules are loaded/unloaded with udev, holding
references on modules can spuriously prevent module unload, which
as a consequence changes the system behavior.

About the relative importance of the various kernel subsystems,
following your reasoning that module unload is considered a
second-class citizen within the kernel, I would argue that tracing
is a third-class citizen and should not needlessly modify the
behavior of classes above it.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com




Re: [PATCH v5 1/2] sched/rt: Clean up usage of rt_task()

2024-06-04 Thread Daniel Bristot de Oliveira
On 6/4/24 18:37, Steven Rostedt wrote:
> On Tue, 4 Jun 2024 17:57:46 +0200
> Daniel Bristot de Oliveira  wrote:
> 
>> On 6/4/24 16:42, Qais Yousef wrote:
>>> -   (wakeup_rt && !dl_task(p) && !rt_task(p)) ||
>>> +   (wakeup_rt && !realtime_task(p)) ||  
>>
>> I do not like bikeshedding, and no hard feelings...
>>
>> But rt is a shortened version of realtime, and so it is making *it less*
>> clear that we also have DL here.
>>
>> I know we can always read the comments, but we can do without changes
>> as well...
>>
>> I would suggest finding the plural version for realtime_task()... so
>> we know it is not about the "rt" scheduler, but rt and dl schedulers.
> 
> priority_task() ?

rt_or_dl_task() ?
rt_schedulers_task() ?
higher_than_fair_task() ? (this is bad haha)

I am not good with names, and it is hard to find one, I know but something
to make it clear that dl is also there becase rt/realtime is ambiguous with
the rt.c.

-- Daniel




Re: [PATCH v2 2/4] perf,uprobes: fix user stack traces in the presence of pending uretprobes

2024-06-04 Thread Andrii Nakryiko
On Tue, Jun 4, 2024 at 7:13 AM Masami Hiramatsu  wrote:
>
> On Tue, 21 May 2024 18:38:43 -0700
> Andrii Nakryiko  wrote:
>
> > When kernel has pending uretprobes installed, it hijacks original user
> > function return address on the stack with a uretprobe trampoline
> > address. There could be multiple such pending uretprobes (either on
> > different user functions or on the same recursive one) at any given
> > time within the same task.
> >
> > This approach interferes with the user stack trace capture logic, which
> > would report suprising addresses (like 0x7fffe000) that correspond
> > to a special "[uprobes]" section that kernel installs in the target
> > process address space for uretprobe trampoline code, while logically it
> > should be an address somewhere within the calling function of another
> > traced user function.
> >
> > This is easy to correct for, though. Uprobes subsystem keeps track of
> > pending uretprobes and records original return addresses. This patch is
> > using this to do a post-processing step and restore each trampoline
> > address entries with correct original return address. This is done only
> > if there are pending uretprobes for current task.
> >
> > This is a similar approach to what fprobe/kretprobe infrastructure is
> > doing when capturing kernel stack traces in the presence of pending
> > return probes.
> >
>
> This looks good to me because this trampoline information is only
> managed in uprobes. And it should be provided when unwinding user
> stack.
>
> Reviewed-by: Masami Hiramatsu (Google) 
>
> Thank you!

Great, thanks for reviewing, Masami!

Would you take this fix through your tree, or where should it be routed to?

>
> > Reported-by: Riham Selim 
> > Signed-off-by: Andrii Nakryiko 
> > ---
> >  kernel/events/callchain.c | 43 ++-
> >  kernel/events/uprobes.c   |  9 
> >  2 files changed, 51 insertions(+), 1 deletion(-)
> >

[...]



[PATCH v3] dt-bindings: remoteproc: k3-dsp: correct optional sram properties for AM62A SoCs

2024-06-04 Thread Hari Nagalla
The C7xv-dsp on AM62A have 32KB L1 I-cache and a 64KB L1 D-cache. It
does not have an addressable l1dram . So, remove this optional sram
property from the bindings to fix device tree build warnings.

Signed-off-by: Hari Nagalla 
---
Changes in v3:
*) Use allOf keyword with separate ifs for each variant instead 
   of nested if/else conditions.

v2: https://lore.kernel.org/all/20240530164816.1051-1-hnaga...@ti.com/

 .../bindings/remoteproc/ti,k3-dsp-rproc.yaml  | 89 +++
 1 file changed, 51 insertions(+), 38 deletions(-)

diff --git a/Documentation/devicetree/bindings/remoteproc/ti,k3-dsp-rproc.yaml 
b/Documentation/devicetree/bindings/remoteproc/ti,k3-dsp-rproc.yaml
index 9768db8663eb..b51bb863d759 100644
--- a/Documentation/devicetree/bindings/remoteproc/ti,k3-dsp-rproc.yaml
+++ b/Documentation/devicetree/bindings/remoteproc/ti,k3-dsp-rproc.yaml
@@ -25,9 +25,6 @@ description: |
   host processor (Arm CorePac) to perform the device management of the remote
   processor and to communicate with the remote processor.
 
-allOf:
-  - $ref: /schemas/arm/keystone/ti,k3-sci-common.yaml#
-
 properties:
   compatible:
 enum:
@@ -89,41 +86,57 @@ properties:
   should be defined as per the generic bindings in,
   Documentation/devicetree/bindings/sram/sram.yaml
 
-if:
-  properties:
-compatible:
-  enum:
-- ti,j721e-c66-dsp
-then:
-  properties:
-reg:
-  items:
-- description: Address and Size of the L2 SRAM internal memory region
-- description: Address and Size of the L1 PRAM internal memory region
-- description: Address and Size of the L1 DRAM internal memory region
-reg-names:
-  items:
-- const: l2sram
-- const: l1pram
-- const: l1dram
-else:
-  if:
-properties:
-  compatible:
-enum:
-  - ti,am62a-c7xv-dsp
-  - ti,j721e-c71-dsp
-  - ti,j721s2-c71-dsp
-  then:
-properties:
-  reg:
-items:
-  - description: Address and Size of the L2 SRAM internal memory region
-  - description: Address and Size of the L1 DRAM internal memory region
-  reg-names:
-items:
-  - const: l2sram
-  - const: l1dram
+allOf:
+  - if:
+  properties:
+compatible:
+  enum:
+- ti,j721e-c66-dsp
+then:
+  properties:
+reg:
+  items:
+- description: Address and Size of the L2 SRAM internal memory 
region
+- description: Address and Size of the L1 PRAM internal memory 
region
+- description: Address and Size of the L1 DRAM internal memory 
region
+reg-names:
+  items:
+- const: l2sram
+- const: l1pram
+- const: l1dram
+
+  - if:
+  properties:
+compatible:
+  enum:
+- ti,j721e-c71-dsp
+- ti,j721s2-c71-dsp
+then:
+  properties:
+reg:
+  items:
+- description: Address and Size of the L2 SRAM internal memory 
region
+- description: Address and Size of the L1 DRAM internal memory 
region
+reg-names:
+  items:
+- const: l2sram
+- const: l1dram
+
+  - if:
+  properties:
+compatible:
+  enum:
+- ti,am62a-c7xv-dsp
+then:
+  properties:
+reg:
+  items:
+- description: Address and Size of the L2 SRAM internal memory 
region
+reg-names:
+  items:
+- const: l2sram
+
+  - $ref: /schemas/arm/keystone/ti,k3-sci-common.yaml#
 
 required:
   - compatible
-- 
2.34.1




Re: [PATCH v2 3/4] perf,x86: avoid missing caller address in stack traces captured in uprobe

2024-06-04 Thread Andrii Nakryiko
On Tue, Jun 4, 2024 at 7:06 AM Masami Hiramatsu  wrote:
>
> On Tue, 21 May 2024 18:38:44 -0700
> Andrii Nakryiko  wrote:
>
> > When tracing user functions with uprobe functionality, it's common to
> > install the probe (e.g., a BPF program) at the first instruction of the
> > function. This is often going to be `push %rbp` instruction in function
> > preamble, which means that within that function frame pointer hasn't
> > been established yet. This leads to consistently missing an actual
> > caller of the traced function, because perf_callchain_user() only
> > records current IP (capturing traced function) and then following frame
> > pointer chain (which would be caller's frame, containing the address of
> > caller's caller).
>
> I thought this problem might be solved by sframe.

Eventually, yes, when real-world applications switch to sframe and we
get proper support for it in the kernel. But right now there are tons
of applications relying on kernel capturing stack traces based on
frame pointers, so it would be good to improve that as well.

>
> >
> > So when we have target_1 -> target_2 -> target_3 call chain and we are
> > tracing an entry to target_3, captured stack trace will report
> > target_1 -> target_3 call chain, which is wrong and confusing.
> >
> > This patch proposes a x86-64-specific heuristic to detect `push %rbp`
> > instruction being traced.
>
> I like this kind of idea :) But I think this should be done in
> the user-space, not in the kernel because it is not always sure
> that the user program uses stack frames.

Existing kernel code that captures user space stack trace already
assumes that code was compiled with a frame pointer (unconditionally),
because that's the best kernel can do. So under that assumption this
heuristic is valid and not harmful, IMO.

User space can do nothing about this, because it is the kernel that
captures stack trace (e.g., from BPF program), and we will lose the
calling frame if we don't do it here.

>
> > If that's the case, with the assumption that
> > applicatoin is compiled with frame pointers, this instruction would be
> > a strong indicator that this is the entry to the function. In that case,
> > return address is still pointed to by %rsp, so we fetch it and add to
> > stack trace before proceeding to unwind the rest using frame
> > pointer-based logic.
>
> Why don't we make it in the userspace BPF program? If it is done
> in the user space, like perf-probe, I'm OK. But I doubt to do this in
> kernel. That means it is not flexible.
>

You mean for the BPF program to capture the entire stack trace by
itself, without asking the kernel for help? It's doable, but:

  a) it's inconvenient for all users to have to reimplement this
low-level "primitive" operation, that we already promise is provided
by kernel (through bpf_get_stack() API, and kernel has internal
perf_callchain API for this)
  b) it's faster for kernel to do this, as kernel code disables page
faults once and unwinds the stack, while BPF program would have to do
multiple bpf_probe_read_user() calls, each individually disabling page
faults.

But really, there is an already existing API, which in some cases
returns partially invalid stack traces (skipping function caller's
frame), so this is an attempt to fix this issue.


> More than anything, without user-space helper to find function
> symbols, uprobe does not know the function entry. Then I'm curious
> why don't you do this in the user space.

You are mixing stack *capture* (in which we get memory addresses
representing where a function call or currently running instruction
pointer is) with stack *symbolization* (where user space needs ELF
symbols and/or DWARF information to translate those addresses into
something human-readable).

This heuristic improves the former as performed by the kernel. Stack
symbolization is completely orthogonal to all of this.

>
> At least, this should be done in the user of uprobes, like trace_uprobe
> or bpf.
>

This is a really miserable user experience, if they have to implement
their own stack trace capture for uprobes, but use built-in
bpf_get_stack() API for any other type of program.

>
> Thank you,
>
> >
> > Signed-off-by: Andrii Nakryiko 
> > ---
> >  arch/x86/events/core.c  | 20 
> >  include/linux/uprobes.h |  2 ++
> >  kernel/events/uprobes.c |  2 ++
> >  3 files changed, 24 insertions(+)
> >

[...]



Re: [PATCH v2 2/3] remoteproc: k3-r5: Acquire mailbox handle during probe

2024-06-04 Thread Andrew Davis

On 6/4/24 12:17 AM, Beleswar Padhi wrote:

Acquire the mailbox handle during device probe and do not release handle
in stop/detach routine or error paths. This removes the redundant
requests for mbox handle later during rproc start/attach. This also
allows to defer remoteproc driver's probe if mailbox is not probed yet.

Signed-off-by: Beleswar Padhi 
---
  drivers/remoteproc/ti_k3_r5_remoteproc.c | 74 +---
  1 file changed, 26 insertions(+), 48 deletions(-)

diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c 
b/drivers/remoteproc/ti_k3_r5_remoteproc.c
index 26362a509ae3c..7e02e3472ce25 100644
--- a/drivers/remoteproc/ti_k3_r5_remoteproc.c
+++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c
@@ -194,6 +194,10 @@ static void k3_r5_rproc_mbox_callback(struct mbox_client 
*client, void *data)
const char *name = kproc->rproc->name;
u32 msg = omap_mbox_message(data);
  
+	/* Do not forward message to a detached core */


s/to/from

This is the receive side from the core.


+   if (kproc->rproc->state == RPROC_DETACHED)
+   return;
+


Do we need a similar check when sending messages to the core in
k3_r5_rproc_kick()? No one should be sending anything as they
all should have detached at this point, but something to double
check on.


dev_dbg(dev, "mbox msg: 0x%x\n", msg);
  
  	switch (msg) {

@@ -399,12 +403,9 @@ static int k3_r5_rproc_request_mbox(struct rproc *rproc)
client->knows_txdone = false;
  
  	kproc->mbox = mbox_request_channel(client, 0);

-   if (IS_ERR(kproc->mbox)) {
-   ret = -EBUSY;
-   dev_err(dev, "mbox_request_channel failed: %ld\n",
-   PTR_ERR(kproc->mbox));
-   return ret;
-   }
+   if (IS_ERR(kproc->mbox))
+   return dev_err_probe(dev, PTR_ERR(kproc->mbox),
+"mbox_request_channel failed\n");


This is good cleanup, but maybe something for its own patch.

  
  	/*

 * Ping the remote processor, this is only for sanity-sake for now;
@@ -552,10 +553,6 @@ static int k3_r5_rproc_start(struct rproc *rproc)
u32 boot_addr;
int ret;
  
-	ret = k3_r5_rproc_request_mbox(rproc);

-   if (ret)
-   return ret;
-
boot_addr = rproc->bootaddr;
/* TODO: add boot_addr sanity checking */
dev_dbg(dev, "booting R5F core using boot addr = 0x%x\n", boot_addr);
@@ -564,7 +561,7 @@ static int k3_r5_rproc_start(struct rproc *rproc)
core = kproc->core;
ret = ti_sci_proc_set_config(core->tsp, boot_addr, 0, 0);
if (ret)
-   goto put_mbox;
+   return ret;
  
  	/* unhalt/run all applicable cores */

if (cluster->mode == CLUSTER_MODE_LOCKSTEP) {
@@ -580,13 +577,12 @@ static int k3_r5_rproc_start(struct rproc *rproc)
if (core != core0 && core0->rproc->state == RPROC_OFFLINE) {
dev_err(dev, "%s: can not start core 1 before core 0\n",
__func__);
-   ret = -EPERM;
-   goto put_mbox;
+   return -EPERM;
}
  
  		ret = k3_r5_core_run(core);

if (ret)
-   goto put_mbox;
+   return ret;
}
  
  	return 0;

@@ -596,8 +592,6 @@ static int k3_r5_rproc_start(struct rproc *rproc)
if (k3_r5_core_halt(core))
dev_warn(core->dev, "core halt back failed\n");
}
-put_mbox:
-   mbox_free_channel(kproc->mbox);
return ret;
  }
  
@@ -658,8 +652,6 @@ static int k3_r5_rproc_stop(struct rproc *rproc)

goto out;
}
  
-	mbox_free_channel(kproc->mbox);

-
return 0;
  
  unroll_core_halt:

@@ -674,42 +666,22 @@ static int k3_r5_rproc_stop(struct rproc *rproc)
  /*
   * Attach to a running R5F remote processor (IPC-only mode)
   *
- * The R5F attach callback only needs to request the mailbox, the remote
- * processor is already booted, so there is no need to issue any TI-SCI
- * commands to boot the R5F cores in IPC-only mode. This callback is invoked
- * only in IPC-only mode.
+ * The R5F attach callback is a NOP. The remote processor is already booted, 
and
+ * all required resources have been acquired during probe routine, so there is
+ * no need to issue any TI-SCI commands to boot the R5F cores in IPC-only mode.
+ * This callback is invoked only in IPC-only mode and exists because
+ * rproc_validate() checks for its existence.
   */
-static int k3_r5_rproc_attach(struct rproc *rproc)
-{
-   struct k3_r5_rproc *kproc = rproc->priv;
-   struct device *dev = kproc->dev;
-   int ret;
-
-   ret = k3_r5_rproc_request_mbox(rproc);
-   if (ret)
-   return ret;
-
-   dev_info(dev, "R5F core initialized in IPC-only mode\n");
-   return 0;
-}
+static int k3_r5_rproc_attach(struct rproc *rproc) { return 0; }


I 

Re: [PATCH v3 00/27] function_graph: Allow multiple users for function graph tracing

2024-06-04 Thread Mark Rutland
On Tue, Jun 04, 2024 at 12:31:24PM -0400, Steven Rostedt wrote:
> On Tue, 4 Jun 2024 15:44:40 +0100
> Mark Rutland  wrote:
> 
> > Hi Steve, Masami,
> > 
> > On Tue, Jun 04, 2024 at 08:18:50AM -0400, Steven Rostedt wrote:
> > > 
> > > Masami,
> > > 
> > > This series passed all my tests, are you comfortable with me pushing
> > > them to linux-next?  
> > 
> > As a heads-up (and not to block pushing this into next), I just gave
> > this a spin on arm64 atop v6.10-rc2, and running the selftests I see:
> > 
> > ftrace - function pid filters
> > (instance)  ftrace - function pid filters
> > 
> > ... both go from [PASS] to [FAIL].
> > 
> > Everything else looks good -- I'll go dig into why that's happening.
> > 
> > It's possible that's just something odd with the filesystem I'm using
> > (e.g. the wnership test failed because this lacks 'stat').
> 
> Thanks for the update. I could be something I missed in patch 13 that had
> to put back the pid code.
> 
> There may have been something arch specific that I'm unaware about. I'll
> look at that deeper.

It looks like e are lines in the trace that it doesn't expect:

+ cat trace
+ grep -v ^#
+ grep 970
+ wc -l
+ count_pid=0
+ cat trace
+ grep -v ^#
+ grep -v 970
+ wc -l
+ count_other=3
+ [ 0 -eq 0 -o 3 -ne 0 ]
+ fail PID filtering not working?

... where we expect that count_other to be 0.

I hacked in a 'cat trace' just before the 'fail' and that shows:

+ cat trace
# tracer: function_graph
#
# CPU  DURATION  FUNCTION CALLS
# | |   | |   |   |   |
 3) ! 143.685 us  |  kernel_clone();
 3) ! 127.055 us  |  kernel_clone();
 1) ! 127.170 us  |  kernel_clone();
 3) ! 126.840 us  |  kernel_clone();

I'm not sure if that's legitimate output the test is failing to account
for or if that indicates a kernel-side issue.

Mark.



Re: [PATCH v5 1/2] sched/rt: Clean up usage of rt_task()

2024-06-04 Thread Steven Rostedt
On Tue, 4 Jun 2024 17:57:46 +0200
Daniel Bristot de Oliveira  wrote:

> On 6/4/24 16:42, Qais Yousef wrote:
> > -   (wakeup_rt && !dl_task(p) && !rt_task(p)) ||
> > +   (wakeup_rt && !realtime_task(p)) ||  
> 
> I do not like bikeshedding, and no hard feelings...
> 
> But rt is a shortened version of realtime, and so it is making *it less*
> clear that we also have DL here.
> 
> I know we can always read the comments, but we can do without changes
> as well...
> 
> I would suggest finding the plural version for realtime_task()... so
> we know it is not about the "rt" scheduler, but rt and dl schedulers.

priority_task() ?

Or should we go with royal purple and call it "royalty_task()" ? ;-)

-- Steve





Re: [PATCH v2 2/3] tracing/fprobe: Support raw tracepoint events on modules

2024-06-04 Thread Steven Rostedt
On Tue, 4 Jun 2024 11:02:16 -0400
Mathieu Desnoyers  wrote:

> I see.
> 
> It looks like there are a few things we could improve there:
> 
> 1) With your approach, modules need to be already loaded before
> attaching an fprobe event to them. This effectively prevents
> attaching to any module init code. Is there any way we could allow
> this by implementing a module coming notifier in fprobe as well ?
> This would require that fprobes are kept around in a data structure
> that matches the modules when they are loaded in the coming notifier.

The above sounds like a nice enhancement, but not something necessary for
this series.

> 
> 2) Given that the fprobe module going notifier is protected by the
> event_mutex, can we use locking rather than reference counting
> in fprobe attach to guarantee the target module is not reclaimed
> concurrently ? This would remove the transient side-effect of
> holding a module reference count which temporarily prevents module
> unload.

Why do we care about unloading modules during the transition? Note, module
unload has always been considered a second class citizen, and there's been
talks in the past to even rip it out.

-- Steve



Re: [PATCH v3 00/27] function_graph: Allow multiple users for function graph tracing

2024-06-04 Thread Steven Rostedt
On Tue, 4 Jun 2024 15:44:40 +0100
Mark Rutland  wrote:

> Hi Steve, Masami,
> 
> On Tue, Jun 04, 2024 at 08:18:50AM -0400, Steven Rostedt wrote:
> > 
> > Masami,
> > 
> > This series passed all my tests, are you comfortable with me pushing
> > them to linux-next?  
> 
> As a heads-up (and not to block pushing this into next), I just gave
> this a spin on arm64 atop v6.10-rc2, and running the selftests I see:
> 
>   ftrace - function pid filters
>   (instance)  ftrace - function pid filters
> 
> ... both go from [PASS] to [FAIL].
> 
> Everything else looks good -- I'll go dig into why that's happening.
> 
> It's possible that's just something odd with the filesystem I'm using
> (e.g. the wnership test failed because this lacks 'stat').

Thanks for the update. I could be something I missed in patch 13 that had
to put back the pid code.

There may have been something arch specific that I'm unaware about. I'll
look at that deeper.

-- Steve



Re: [PATCH v5 2/2] sched/rt, dl: Convert functions to return bool

2024-06-04 Thread Steven Rostedt
On Tue,  4 Jun 2024 15:42:28 +0100
Qais Yousef  wrote:

> {rt, realtime, dl}_{task, prio}() functions return value is actually
> a bool.  Convert their return type to reflect that.
> 
> Suggested-by: Steven Rostedt (Google) 
> Signed-off-by: Qais Yousef 
> ---
>  include/linux/sched/deadline.h |  8 +++-
>  include/linux/sched/rt.h   | 16 ++--
>  2 files changed, 9 insertions(+), 15 deletions(-)

Reviewed-by: Steven Rostedt (Google) 

-- Steve



Re: [PATCH v5 1/2] sched/rt: Clean up usage of rt_task()

2024-06-04 Thread Daniel Bristot de Oliveira
On 6/4/24 16:42, Qais Yousef wrote:
> - (wakeup_rt && !dl_task(p) && !rt_task(p)) ||
> + (wakeup_rt && !realtime_task(p)) ||

I do not like bikeshedding, and no hard feelings...

But rt is a shortened version of realtime, and so it is making *it less*
clear that we also have DL here.

I know we can always read the comments, but we can do without changes
as well...

I would suggest finding the plural version for realtime_task()... so
we know it is not about the "rt" scheduler, but rt and dl schedulers.

-- Daniel




Re: [PATCH v4] drivers: remoteproc: xlnx: add attach detach support

2024-06-04 Thread Mathieu Poirier
Hi Tanmay,

On Mon, Jun 03, 2024 at 01:34:38PM -0700, Tanmay Shah wrote:
> It is possible that remote processor is already running before
> linux boot or remoteproc platform driver probe. Implement required
> remoteproc framework ops to provide resource table address and
> connect or disconnect with remote processor in such case.
> 
> Signed-off-by: Tanmay Shah 
> 
> ---
> Changes in v4:
>   - Move change log out of commit text
> 
> Changes in v3:
> 
>   - Drop SRAM patch from the series
>   - Change type from "struct resource_table *" to void __iomem *
>   - Change comment format from /** to /*
>   - Remove unmap of resource table va address during detach, allowing
> attach-detach-reattach use case.
>   - Unmap rsc_data_va after retrieving resource table data structure.
>   - Unmap resource table va during driver remove op
> 
> Changes in v2:
> 
>   - Fix typecast warnings reported using sparse tool.
>   - Fix following sparse warnings:
> 
> drivers/remoteproc/xlnx_r5_remoteproc.c:827:21: sparse: warning: incorrect 
> type in assignment (different address spaces)
> drivers/remoteproc/xlnx_r5_remoteproc.c:844:18: sparse: warning: incorrect 
> type in assignment (different address spaces)
> drivers/remoteproc/xlnx_r5_remoteproc.c:898:24: sparse: warning: incorrect 
> type in argument 1 (different address spaces)
>  drivers/remoteproc/xlnx_r5_remoteproc.c | 173 +++-
>  1 file changed, 169 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
> b/drivers/remoteproc/xlnx_r5_remoteproc.c
> index 84243d1dff9f..6898d4761566 100644
> --- a/drivers/remoteproc/xlnx_r5_remoteproc.c
> +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
> @@ -25,6 +25,10 @@
>  /* RX mailbox client buffer max length */
>  #define MBOX_CLIENT_BUF_MAX  (IPI_BUF_LEN_MAX + \
>sizeof(struct zynqmp_ipi_message))
> +
> +#define RSC_TBL_XLNX_MAGIC   ((uint32_t)'x' << 24 | (uint32_t)'a' << 16 | \
> +  (uint32_t)'m' << 8 | (uint32_t)'p')
> +
>  /*
>   * settings for RPU cluster mode which
>   * reflects possible values of xlnx,cluster-mode dt-property
> @@ -73,6 +77,26 @@ struct mbox_info {
>   struct mbox_chan *rx_chan;
>  };
>  
> +/**
> + * struct rsc_tbl_data
> + *
> + * Platform specific data structure used to sync resource table address.
> + * It's important to maintain order and size of each field on remote side.
> + *
> + * @version: version of data structure
> + * @magic_num: 32-bit magic number.
> + * @comp_magic_num: complement of above magic number
> + * @rsc_tbl_size: resource table size
> + * @rsc_tbl: resource table address
> + */
> +struct rsc_tbl_data {
> + const int version;
> + const u32 magic_num;
> + const u32 comp_magic_num;

I thought we agreed on making the magic number a u64 - did I get this wrong?

> + const u32 rsc_tbl_size;
> + const uintptr_t rsc_tbl;
> +} __packed;
> +
>  /*
>   * Hardcoded TCM bank values. This will stay in driver to maintain backward
>   * compatibility with device-tree that does not have TCM information.
> @@ -95,20 +119,24 @@ static const struct mem_bank_data 
> zynqmp_tcm_banks_lockstep[] = {
>  /**
>   * struct zynqmp_r5_core
>   *
> + * @rsc_tbl_va: resource table virtual address
>   * @dev: device of RPU instance
>   * @np: device node of RPU instance
>   * @tcm_bank_count: number TCM banks accessible to this RPU
>   * @tcm_banks: array of each TCM bank data
>   * @rproc: rproc handle
> + * @rsc_tbl_size: resource table size retrieved from remote
>   * @pm_domain_id: RPU CPU power domain id
>   * @ipi: pointer to mailbox information
>   */
>  struct zynqmp_r5_core {
> + void __iomem *rsc_tbl_va;
>   struct device *dev;
>   struct device_node *np;
>   int tcm_bank_count;
>   struct mem_bank_data **tcm_banks;
>   struct rproc *rproc;
> + u32 rsc_tbl_size;
>   u32 pm_domain_id;
>   struct mbox_info *ipi;
>  };
> @@ -621,10 +649,19 @@ static int zynqmp_r5_rproc_prepare(struct rproc *rproc)
>  {
>   int ret;
>  
> - ret = add_tcm_banks(rproc);
> - if (ret) {
> - dev_err(>dev, "failed to get TCM banks, err %d\n", ret);
> - return ret;
> + /*
> +  * For attach/detach use case, Firmware is already loaded so
> +  * TCM isn't really needed at all. Also, for security TCM can be
> +  * locked in such case and linux may not have access at all.
> +  * So avoid adding TCM banks. TCM power-domains requested during attach
> +  * callback.
> +  */
> + if (rproc->state != RPROC_DETACHED) {

if (rproc->state == RPROC_DETACHED)
return 0;

ret = add_tcm_banks(rproc);
if (ret) {
dev_err(>dev, "failed to get TCM banks, err %d\n", ret);
return ret;
}

> + ret = add_tcm_banks(rproc);
> + if (ret) {
> + dev_err(>dev, "failed to get TCM banks, err 
> %d\n", 

Re: [PATCH 0/2] Add HTC One (M8) support

2024-06-04 Thread Rob Herring (Arm)


On Mon, 03 Jun 2024 02:28:55 -0400, Alexandre Messier wrote:
> Add an initial device tree to support the HTC One (M8) smartphone,
> aka "htc,m8".
> 
> Signed-off-by: Alexandre Messier 
> ---
> Alexandre Messier (2):
>   dt-bindings: arm: qcom: add HTC One (M8)
>   ARM: dts: qcom: Add initial support for HTC One (M8)
> 
>  Documentation/devicetree/bindings/arm/qcom.yaml   |   1 +
>  arch/arm/boot/dts/qcom/Makefile   |   1 +
>  arch/arm/boot/dts/qcom/qcom-msm8974pro-htc-m8.dts | 353 
> ++
>  3 files changed, 355 insertions(+)
> ---
> base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
> change-id: 20240603-m8-support-9458b378f168
> 
> Best regards,
> --
> Alexandre Messier 
> 
> 
> 


My bot found new DTB warnings on the .dts files added or changed in this
series.

Some warnings may be from an existing SoC .dtsi. Or perhaps the warnings
are fixed by another series. Ultimately, it is up to the platform
maintainer whether these warnings are acceptable or not. No need to reply
unless the platform maintainer has comments.

If you already ran DT checks and didn't see these error(s), then
make sure dt-schema is up to date:

  pip3 install dtschema --upgrade


New warnings running 'make CHECK_DTBS=y qcom/qcom-msm8974pro-htc-m8.dtb' for 
20240603-m8-support-v1-0-c7b6a1941...@me.ssier.org:

arch/arm/boot/dts/qcom/qcom-msm8974pro-htc-m8.dtb: l2-cache: Unevaluated 
properties are not allowed ('qcom,saw' was unexpected)
from schema $id: http://devicetree.org/schemas/cache.yaml#








Re: [PATCH v4 2/2] sched/rt, dl: Convert functions to return bool

2024-06-04 Thread Steven Rostedt
On Mon, 3 Jun 2024 08:33:53 +0100
Metin Kaya  wrote:

> On 01/06/2024 10:33 pm, Qais Yousef wrote:
> > {rt, realtime, dl}_{task, prio}() functions return value is actually
> > a bool.  Convert their return type to reflect that.
> > 
> > Suggested-by: Steven Rostedt (Google) 
> > Signed-off-by: Qais Yousef 
> > ---
> >   include/linux/sched/deadline.h |  8 
> >   include/linux/sched/rt.h   | 16 
> >   2 files changed, 12 insertions(+), 12 deletions(-)
> > 
> > diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
> > index 5cb88b748ad6..f2053f46f1d5 100644
> > --- a/include/linux/sched/deadline.h
> > +++ b/include/linux/sched/deadline.h
> > @@ -10,18 +10,18 @@
> >   
> >   #include 
> >   
> > -static inline int dl_prio(int prio)
> > +static inline bool dl_prio(int prio)
> >   {
> > if (unlikely(prio < MAX_DL_PRIO))
> > -   return 1;
> > -   return 0;
> > +   return true;
> > +   return false;  
> 
> Nit: `return unlikely(prio < MAX_DL_PRIO)` would be simpler.
> The same can be applied to rt_prio() and realtime_prio(). This would 
> make {dl, rt, realtime}_task() single-liner. Maybe further 
> simplification can be done.

Agreed.

-- Steve



Re: [PATCH v2 3/3] sefltests/tracing: Add a test for tracepoint events on modules

2024-06-04 Thread Steven Rostedt
On Sat,  1 Jun 2024 17:22:55 +0900
"Masami Hiramatsu (Google)"  wrote:

> From: Masami Hiramatsu (Google) 
> 
> Add a test case for tracepoint events on modules. This checks if it can add
> and remove the events correctly.
> 
> Signed-off-by: Masami Hiramatsu (Google) 

Reviewed-by: Steven Rostedt (Google) 

-- Steve



Re: [PATCH v2 2/3] tracing/fprobe: Support raw tracepoint events on modules

2024-06-04 Thread Steven Rostedt
On Tue, 4 Jun 2024 08:49:55 +0900
Masami Hiramatsu (Google)  wrote:

> On Mon, 3 Jun 2024 15:50:55 -0400
> Mathieu Desnoyers  wrote:
> 
> > On 2024-06-01 04:22, Masami Hiramatsu (Google) wrote:  
> > > From: Masami Hiramatsu (Google) 
> > > 
> > > Support raw tracepoint event on module by fprobe events.
> > > Since it only uses for_each_kernel_tracepoint() to find a tracepoint,
> > > the tracepoints on modules are not handled. Thus if user specified a
> > > tracepoint on a module, it shows an error.
> > > This adds new for_each_module_tracepoint() API to tracepoint subsystem,
> > > and uses it to find tracepoints on modules.  
> > 
> > Hi Masami,
> > 
> > Why prevent module unload when a fprobe tracepoint is attached to a
> > module ? This changes the kernel's behavior significantly just for the
> > sake of instrumentation.  
> 
> I don't prevent module unloading all the time, just before registering
> tracepoint handler (something like booking a ticket :-) ).
> See the last hunk of this patch, it puts the module before exiting
> __trace_fprobe_create().
> 
> > 
> > As an alternative, LTTng-modules attach/detach to/from modules with the
> > coming/going notifiers, so the instrumentation gets removed when a
> > module is unloaded rather than preventing its unload by holding a module
> > reference count. I would recommend a similar approach for fprobe.  
> 
> Yes, since tracepoint subsystem provides a notifier API to notify the
> tracepoint is gone, fprobe already uses it to find unloading and
> unregister the target function. (see __tracepoint_probe_module_cb)
> 

Ah, it only prevents module unloading in __trace_fprobe_create()

> +static void __find_tracepoint_module_cb(struct tracepoint *tp, void *priv)
> +{
> + struct __find_tracepoint_cb_data *data = priv;
> +
> + if (!data->tpoint && !strcmp(data->tp_name, tp->name)) {
> + data->tpoint = tp;
> + data->mod = __module_text_address((unsigned long)tp->probestub);
> + if (!try_module_get(data->mod)) {

Here it gets the module. Should only happen once, as it sets data->tpoint.

> + data->tpoint = NULL;
> + data->mod = NULL;
> + }
> + }
> +}
> +
>  static void __find_tracepoint_cb(struct tracepoint *tp, void *priv)
>  {
>   struct __find_tracepoint_cb_data *data = priv;
> @@ -905,14 +922,28 @@ static void __find_tracepoint_cb(struct tracepoint *tp, 
> void *priv)
>   data->tpoint = tp;
>  }
>  
> -static struct tracepoint *find_tracepoint(const char *tp_name)
> +/*
> + * Find a tracepoint from kernel and module. If the tracepoint is in a 
> module,
> + * this increments the module refcount to prevent unloading until the
> + * trace_fprobe is registered to the list. After registering the trace_fprobe
> + * on the trace_fprobe list, the module refcount is decremented because
> + * tracepoint_probe_module_cb will handle it.
> + */
> +static struct tracepoint *find_tracepoint(const char *tp_name,
> +   struct module **tp_mod)
>  {
>   struct __find_tracepoint_cb_data data = {
>   .tp_name = tp_name,
> + .mod = NULL,
>   };
>  
>   for_each_kernel_tracepoint(__find_tracepoint_cb, );
>  
> + if (!data.tpoint && IS_ENABLED(CONFIG_MODULES)) {
> + for_each_module_tracepoint(__find_tracepoint_module_cb, );
> + *tp_mod = data.mod;
> + }
> +
>   return data.tpoint;
>  }
>  
> @@ -996,6 +1027,7 @@ static int __trace_fprobe_create(int argc, const char 
> *argv[])
>   char abuf[MAX_BTF_ARGS_LEN];
>   char *dbuf = NULL;
>   bool is_tracepoint = false;
> + struct module *tp_mod = NULL;
>   struct tracepoint *tpoint = NULL;
>   struct traceprobe_parse_context ctx = {
>   .flags = TPARG_FL_KERNEL | TPARG_FL_FPROBE,
> @@ -1080,7 +1112,7 @@ static int __trace_fprobe_create(int argc, const char 
> *argv[])
>  
>   if (is_tracepoint) {
>   ctx.flags |= TPARG_FL_TPOINT;
> - tpoint = find_tracepoint(symbol);
> + tpoint = find_tracepoint(symbol, _mod);
>   if (!tpoint) {
>   trace_probe_log_set_index(1);
>   trace_probe_log_err(0, NO_TRACEPOINT);
> @@ -1110,8 +1142,8 @@ static int __trace_fprobe_create(int argc, const char 
> *argv[])
>   goto out;
>  
>   /* setup a probe */
> - tf = alloc_trace_fprobe(group, event, symbol, tpoint, maxactive,
> - argc, is_return);
> + tf = alloc_trace_fprobe(group, event, symbol, tpoint, tp_mod,
> + maxactive, argc, is_return);
>   if (IS_ERR(tf)) {
>   ret = PTR_ERR(tf);
>   /* This must return -ENOMEM, else there is a bug */
> @@ -1119,10 +1151,6 @@ static int __trace_fprobe_create(int argc, const char 
> *argv[])
>   goto out;   /* We know tf is not allocated */
>   }
>  
> - 

Re: [PATCH v2 2/3] tracing/fprobe: Support raw tracepoint events on modules

2024-06-04 Thread Mathieu Desnoyers

On 2024-06-03 19:49, Masami Hiramatsu (Google) wrote:

On Mon, 3 Jun 2024 15:50:55 -0400
Mathieu Desnoyers  wrote:


On 2024-06-01 04:22, Masami Hiramatsu (Google) wrote:

From: Masami Hiramatsu (Google) 

Support raw tracepoint event on module by fprobe events.
Since it only uses for_each_kernel_tracepoint() to find a tracepoint,
the tracepoints on modules are not handled. Thus if user specified a
tracepoint on a module, it shows an error.
This adds new for_each_module_tracepoint() API to tracepoint subsystem,
and uses it to find tracepoints on modules.


Hi Masami,

Why prevent module unload when a fprobe tracepoint is attached to a
module ? This changes the kernel's behavior significantly just for the
sake of instrumentation.


I don't prevent module unloading all the time, just before registering
tracepoint handler (something like booking a ticket :-) ).
See the last hunk of this patch, it puts the module before exiting
__trace_fprobe_create().


So at least this is transient, but I still have concerns, see below.



As an alternative, LTTng-modules attach/detach to/from modules with the
coming/going notifiers, so the instrumentation gets removed when a
module is unloaded rather than preventing its unload by holding a module
reference count. I would recommend a similar approach for fprobe.


Yes, since tracepoint subsystem provides a notifier API to notify the
tracepoint is gone, fprobe already uses it to find unloading and
unregister the target function. (see __tracepoint_probe_module_cb)


I see.

It looks like there are a few things we could improve there:

1) With your approach, modules need to be already loaded before
attaching an fprobe event to them. This effectively prevents
attaching to any module init code. Is there any way we could allow
this by implementing a module coming notifier in fprobe as well ?
This would require that fprobes are kept around in a data structure
that matches the modules when they are loaded in the coming notifier.

2) Given that the fprobe module going notifier is protected by the
event_mutex, can we use locking rather than reference counting
in fprobe attach to guarantee the target module is not reclaimed
concurrently ? This would remove the transient side-effect of
holding a module reference count which temporarily prevents module
unload.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com




[CfP] Confidential Computing Microconference @ LPC 2024

2024-06-04 Thread Jörg Rödel
Hi,

We are pleased to announce the call for presentations for the
Confidential Computing microconference at the Linux Plumbers Conference,
happening in Vienna from September 18th to 20th.

In this microconference we want to discuss ongoing developments around Linux
support for memory encryption and confidential computing in general.

Topics of interest include:


* Support TEE privilege separation extensions (TDX partitioning and AMD
  SEV-SNP VM Privilege Levels) both on the guest and host side.

* Secure IRQ delivery

* Secure VM Service Module (SVSM) support for multiple TEE architectures

* Trusted I/O software architecture

* Live migration of confidential virtual machines

* Remote attestation architectures

* Deployment of Confidential VMs

* Linux as a CVM operating system across hypervisors

* Unification of various confidential computing API

Please use the LPC CfP process to submit your proposals. Submissions can
be made via

https://lpc.events/event/18/abstracts/

Make sure to select "Confidential Computing MC" as the track and submit
your session proposal by July 15th. Submissions made after that date
can not be included into the microconference.

Looking forward to seeing all of you in Vienna in September!

Thanks,

Dhaval and Joerg




Call for Registration - Confidential Computing Microconference @ LPC 2024

2024-06-04 Thread Jörg Rödel
Hi,

We are pleased to announce that the Confidential Computing Microconference has
been accepted into this years Linux Plumbers Conference, happening from
September 18th to 20th in Vienna, Austria.

With this Call for Registration we want to encourage everyone interested in
this microconference to register for LPC soon and select the Confidential
Computing among the microconferences you plan to attend.

This is important because, due to the high number of accepted
microconferences, the LPC Planing Committee did not decide yet which
microconferences will get a 3 hour vs a 1.5 hour time slot. The more
people express their interest, the better our chances for a longer time
slot.

A Call for Presentations will be sent out shortly in a separate email.

Hope to see you all there!


Dhaval and Joerg



[PATCH v5 2/2] sched/rt, dl: Convert functions to return bool

2024-06-04 Thread Qais Yousef
{rt, realtime, dl}_{task, prio}() functions return value is actually
a bool.  Convert their return type to reflect that.

Suggested-by: Steven Rostedt (Google) 
Signed-off-by: Qais Yousef 
---
 include/linux/sched/deadline.h |  8 +++-
 include/linux/sched/rt.h   | 16 ++--
 2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index 5cb88b748ad6..3a912ab42bb5 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -10,18 +10,16 @@
 
 #include 
 
-static inline int dl_prio(int prio)
+static inline bool dl_prio(int prio)
 {
-   if (unlikely(prio < MAX_DL_PRIO))
-   return 1;
-   return 0;
+   return unlikely(prio < MAX_DL_PRIO);
 }
 
 /*
  * Returns true if a task has a priority that belongs to DL class. PI-boosted
  * tasks will return true. Use dl_policy() to ignore PI-boosted tasks.
  */
-static inline int dl_task(struct task_struct *p)
+static inline bool dl_task(struct task_struct *p)
 {
return dl_prio(p->prio);
 }
diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index a055dd68a77c..91ef1ef2019f 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -6,25 +6,21 @@
 
 struct task_struct;
 
-static inline int rt_prio(int prio)
+static inline bool rt_prio(int prio)
 {
-   if (unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO))
-   return 1;
-   return 0;
+   return unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO);
 }
 
-static inline int realtime_prio(int prio)
+static inline bool realtime_prio(int prio)
 {
-   if (unlikely(prio < MAX_RT_PRIO))
-   return 1;
-   return 0;
+   return unlikely(prio < MAX_RT_PRIO);
 }
 
 /*
  * Returns true if a task has a priority that belongs to RT class. PI-boosted
  * tasks will return true. Use rt_policy() to ignore PI-boosted tasks.
  */
-static inline int rt_task(struct task_struct *p)
+static inline bool rt_task(struct task_struct *p)
 {
return rt_prio(p->prio);
 }
@@ -34,7 +30,7 @@ static inline int rt_task(struct task_struct *p)
  * PI-boosted tasks will return true. Use realtime_task_policy() to ignore
  * PI-boosted tasks.
  */
-static inline int realtime_task(struct task_struct *p)
+static inline bool realtime_task(struct task_struct *p)
 {
return realtime_prio(p->prio);
 }
-- 
2.34.1




Re: [PATCH v3 00/27] function_graph: Allow multiple users for function graph tracing

2024-06-04 Thread Mark Rutland
Hi Steve, Masami,

On Tue, Jun 04, 2024 at 08:18:50AM -0400, Steven Rostedt wrote:
> 
> Masami,
> 
> This series passed all my tests, are you comfortable with me pushing
> them to linux-next?

As a heads-up (and not to block pushing this into next), I just gave
this a spin on arm64 atop v6.10-rc2, and running the selftests I see:

ftrace - function pid filters
(instance)  ftrace - function pid filters

... both go from [PASS] to [FAIL].

Everything else looks good -- I'll go dig into why that's happening.

It's possible that's just something odd with the filesystem I'm using
(e.g. the wnership test failed because this lacks 'stat').

Mark.

> 
> -- Steve
> 
> 
> On Mon, 03 Jun 2024 15:07:04 -0400
> Steven Rostedt  wrote:
> 
> > This is a continuation of the function graph multi user code.
> > I wrote a proof of concept back in 2019 of this code[1] and
> > Masami started cleaning it up. I started from Masami's work v10
> > that can be found here:
> > 
> >  
> > https://lore.kernel.org/linux-trace-kernel/171509088006.162236.7227326999861366050.stgit@devnote2/
> > 
> > This is *only* the code that allows multiple users of function
> > graph tracing. This is not the fprobe work that Masami is working
> > to add on top of it. As Masami took my proof of concept, there
> > was still several things I disliked about that code. Instead of
> > having Masami clean it up even more, I decided to take over on just
> > my code and change it up a bit.
> > 
> > Changes since v2: 
> > https://lore.kernel.org/linux-trace-kernel/20240602033744.563858...@goodmis.org
> > 
> > - Added comments describing which hashes the append and intersect
> >   functions were used for.
> > 
> > - Replaced checks of (NULL or EMPTY_HASH) with ftrace_hash_empty()
> >   helper function.
> > 
> > - Added check at the end of intersect_hash() to convert the hash
> >   to EMPTY hash if it doesn't have any functions.
> > 
> > - Renamed compare_ops() to ops_equal() and return boolean (inversed return
> >   value).
> > 
> > - Broke out __ftrace_hash_move_and_update_ops() to use in both
> >   ftrace_hash_move_and_update_ops() and 
> > ftrace_hash_move_and_update_subops().
> > 
> > Diff between last version at end of this email.
> > 
> > Masami Hiramatsu (Google) (3):
> >   function_graph: Handle tail calls for stack unwinding
> >   function_graph: Use a simple LRU for fgraph_array index number
> >   ftrace: Add multiple fgraph storage selftest
> > 
> > Steven Rostedt (Google) (9):
> >   ftrace: Add subops logic to allow one ops to manage many
> >   ftrace: Allow subops filtering to be modified
> >   function_graph: Add pid tracing back to function graph tracer
> >   function_graph: Use for_each_set_bit() in __ftrace_return_to_handler()
> >   function_graph: Use bitmask to loop on fgraph entry
> >   function_graph: Use static_call and branch to optimize entry function
> >   function_graph: Use static_call and branch to optimize return function
> >   selftests/ftrace: Add function_graph tracer to func-filter-pid test
> >   selftests/ftrace: Add fgraph-multi.tc test
> > 
> > Steven Rostedt (VMware) (15):
> >   function_graph: Convert ret_stack to a series of longs
> >   fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible 
> > by long
> >   function_graph: Add an array structure that will allow multiple 
> > callbacks
> >   function_graph: Allow multiple users to attach to function graph
> >   function_graph: Remove logic around ftrace_graph_entry and return
> >   ftrace/function_graph: Pass fgraph_ops to function graph callbacks
> >   ftrace: Allow function_graph tracer to be enabled in instances
> >   ftrace: Allow ftrace startup flags to exist without dynamic ftrace
> >   function_graph: Have the instances use their own ftrace_ops for 
> > filtering
> >   function_graph: Add "task variables" per task for fgraph_ops
> >   function_graph: Move set_graph_function tests to shadow stack global 
> > var
> >   function_graph: Move graph depth stored data to shadow stack global 
> > var
> >   function_graph: Move graph notrace bit to shadow stack global var
> >   function_graph: Implement fgraph_reserve_data() and 
> > fgraph_retrieve_data()
> >   function_graph: Add selftest for passing local variables
> > 
> > 
> >  include/linux/ftrace.h |   43 +-
> >  include/linux/sched.h  |2 +-
> >  include/linux/trace_recursion.h|   39 -
> >  kernel/trace/fgraph.c  | 1044 
> > 
> >  kernel/trace/ftrace.c  |  522 +-
> >  kernel/trace/ftrace_internal.h |5 +-
> >  kernel/trace/trace.h   |   94 +-
> >  kernel/trace/trace_functions.c |8 +
> >  kernel/trace/trace_functions_graph.c   |   96 

[PATCH v5 1/2] sched/rt: Clean up usage of rt_task()

2024-06-04 Thread Qais Yousef
rt_task() checks if a task has RT priority. But depends on your
dictionary, this could mean it belongs to RT class, or is a 'realtime'
task, which includes RT and DL classes.

Since this has caused some confusion already on discussion [1], it
seemed a clean up is due.

I define the usage of rt_task() to be tasks that belong to RT class.
Make sure that it returns true only for RT class and audit the users and
replace the ones required the old behavior with the new realtime_task()
which returns true for RT and DL classes. Introduce similar
realtime_prio() to create similar distinction to rt_prio() and update
the users that required the old behavior to use the new function.

Move MAX_DL_PRIO to prio.h so it can be used in the new definitions.

Document the functions to make it more obvious what is the difference
between them. PI-boosted tasks is a factor that must be taken into
account when choosing which function to use.

Rename task_is_realtime() to realtime_task_policy() as the old name is
confusing against the new realtime_task().

No functional changes were intended.

[1] 
https://lore.kernel.org/lkml/20240506100509.gl40...@noisy.programming.kicks-ass.net/

Reviewed-by: Phil Auld 
Reviewed-by: Steven Rostedt (Google) 
Signed-off-by: Qais Yousef 
---
 fs/bcachefs/six.c |  2 +-
 fs/select.c   |  2 +-
 include/linux/ioprio.h|  2 +-
 include/linux/sched/deadline.h|  6 --
 include/linux/sched/prio.h|  1 +
 include/linux/sched/rt.h  | 27 ++-
 kernel/locking/rtmutex.c  |  4 ++--
 kernel/locking/rwsem.c|  4 ++--
 kernel/locking/ww_mutex.h |  2 +-
 kernel/sched/core.c   |  4 ++--
 kernel/sched/syscalls.c   |  2 +-
 kernel/time/hrtimer.c |  6 +++---
 kernel/trace/trace_sched_wakeup.c |  2 +-
 mm/page-writeback.c   |  4 ++--
 mm/page_alloc.c   |  2 +-
 15 files changed, 49 insertions(+), 21 deletions(-)

diff --git a/fs/bcachefs/six.c b/fs/bcachefs/six.c
index 3a494c5d1247..b30870bf7e4a 100644
--- a/fs/bcachefs/six.c
+++ b/fs/bcachefs/six.c
@@ -335,7 +335,7 @@ static inline bool six_owner_running(struct six_lock *lock)
 */
rcu_read_lock();
struct task_struct *owner = READ_ONCE(lock->owner);
-   bool ret = owner ? owner_on_cpu(owner) : !rt_task(current);
+   bool ret = owner ? owner_on_cpu(owner) : !realtime_task(current);
rcu_read_unlock();
 
return ret;
diff --git a/fs/select.c b/fs/select.c
index 9515c3fa1a03..8d5c1419416c 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -82,7 +82,7 @@ u64 select_estimate_accuracy(struct timespec64 *tv)
 * Realtime tasks get a slack of 0 for obvious reasons.
 */
 
-   if (rt_task(current))
+   if (realtime_task(current))
return 0;
 
ktime_get_ts64();
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index db1249cd9692..75859b78d540 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -40,7 +40,7 @@ static inline int task_nice_ioclass(struct task_struct *task)
 {
if (task->policy == SCHED_IDLE)
return IOPRIO_CLASS_IDLE;
-   else if (task_is_realtime(task))
+   else if (realtime_task_policy(task))
return IOPRIO_CLASS_RT;
else
return IOPRIO_CLASS_BE;
diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index df3aca89d4f5..5cb88b748ad6 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -10,8 +10,6 @@
 
 #include 
 
-#define MAX_DL_PRIO0
-
 static inline int dl_prio(int prio)
 {
if (unlikely(prio < MAX_DL_PRIO))
@@ -19,6 +17,10 @@ static inline int dl_prio(int prio)
return 0;
 }
 
+/*
+ * Returns true if a task has a priority that belongs to DL class. PI-boosted
+ * tasks will return true. Use dl_policy() to ignore PI-boosted tasks.
+ */
 static inline int dl_task(struct task_struct *p)
 {
return dl_prio(p->prio);
diff --git a/include/linux/sched/prio.h b/include/linux/sched/prio.h
index ab83d85e1183..6ab43b4f72f9 100644
--- a/include/linux/sched/prio.h
+++ b/include/linux/sched/prio.h
@@ -14,6 +14,7 @@
  */
 
 #define MAX_RT_PRIO100
+#define MAX_DL_PRIO0
 
 #define MAX_PRIO   (MAX_RT_PRIO + NICE_WIDTH)
 #define DEFAULT_PRIO   (MAX_RT_PRIO + NICE_WIDTH / 2)
diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index b2b9e6eb9683..a055dd68a77c 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -7,18 +7,43 @@
 struct task_struct;
 
 static inline int rt_prio(int prio)
+{
+   if (unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO))
+   return 1;
+   return 0;
+}
+
+static inline int realtime_prio(int prio)
 {
if (unlikely(prio < MAX_RT_PRIO))
return 1;
return 0;
 }
 
+/*
+ * Returns true if a task has 

[PATCH v5 0/2] Clean up usage of rt_task()

2024-06-04 Thread Qais Yousef
Make rt_task() return true only for RT class and add new realtime_task() to
return true for RT and DL classes to avoid some confusion the old API can
cause.

No functional changes intended in patch 1. Patch 2 cleans up the return type as
suggested by Steve.

Changes since v4:

* Simplify return of rt/realtime_prio() as the explicit true/false was
  not necessary (Metin).

Changes since v3:

* Make sure the 'new' bool functions return true/false instead of 1/0.
* Drop patch 2 about hrtimer usage of realtime_task() as ongoing
  discussion on v1 indicates its scope outside of this simple cleanup.

Changes since v2:

* Fix one user that should use realtime_task() but remained using
  rt_task() (Sebastian)
* New patch to convert all hrtimer users to use realtime_task_policy()
  (Sebastian)
* Add a new patch to convert return type to bool (Steve)
* Rebase on tip/sched/core and handle a conflict with code shuffle to
  syscalls.c
* Add Reviewed-by Steve

Changes since v1:

* Use realtime_task_policy() instead task_has_realtime_policy() (Peter)
* Improve commit message readability about replace some rt_task()
  users.

v1 discussion: 
https://lore.kernel.org/lkml/20240514234112.792989-1-qyou...@layalina.io/
v2 discussion: 
https://lore.kernel.org/lkml/20240515220536.823145-1-qyou...@layalina.io/
v3 discussion: 
https://lore.kernel.org/lkml/20240527234508.1062360-1-qyou...@layalina.io/
v4 discussion: 
https://lore.kernel.org/lkml/20240601213309.1262206-1-qyou...@layalina.io/

Qais Yousef (2):
  sched/rt: Clean up usage of rt_task()
  sched/rt, dl: Convert functions to return bool

 fs/bcachefs/six.c |  2 +-
 fs/select.c   |  2 +-
 include/linux/ioprio.h|  2 +-
 include/linux/sched/deadline.h| 14 ++---
 include/linux/sched/prio.h|  1 +
 include/linux/sched/rt.h  | 33 +--
 kernel/locking/rtmutex.c  |  4 ++--
 kernel/locking/rwsem.c|  4 ++--
 kernel/locking/ww_mutex.h |  2 +-
 kernel/sched/core.c   |  4 ++--
 kernel/sched/syscalls.c   |  2 +-
 kernel/time/hrtimer.c |  6 +++---
 kernel/trace/trace_sched_wakeup.c |  2 +-
 mm/page-writeback.c   |  4 ++--
 mm/page_alloc.c   |  2 +-
 15 files changed, 53 insertions(+), 31 deletions(-)

-- 
2.34.1




Re: [PATCH] livepatch: introduce klp_func called interface

2024-06-04 Thread Joe Lawrence
On Tue, Jun 04, 2024 at 04:14:51PM +0800, zhang warden wrote:
> 
> 
> > On Jun 1, 2024, at 03:16, Joe Lawrence  wrote:
> > 
> > Adding these attributes to livepatch sysfs would be expedient and
> > probably easier for us to use, but imposes a recurring burden on us to
> > maintain and test (where is the documentation and kselftest for this new
> > interface?).  Or, we could let the other tools handle all of that for
> > us.
> How this attribute imposes a recurring burden to maintain and test?
> 

Perhaps "responsibility" is a better description.  This would introduce
an attribute that someone's userspace utility is relying on.  It should
at least have a kselftest to ensure a random patch in 2027 doesn't break
it.

> > Perhaps if someone already has an off-the-shelf script that is using
> > ftrace to monitor livepatched code, it could be donated to
> > Documentation/livepatch/?  I can ask our QE folks if they have something
> > like this.
> 
> My intention to introduce this attitude to sysfs is that user who what to see 
> if this function is called can just need to show this function attribute in 
> the livepatch sysfs interface.
> 
> User who have no experience of using ftrace will have problems to get the 
> calling state of the patched function. After all, ftrace is a professional 
> kernel tracing tools.
> 
> Adding this attribute will be more easier for us to show if this patched 
> function is called. Actually, I have never try to use ftrace to trace a 
> patched function. Is it OK of using ftrace for a livepatched function?
> 

If you build with CONFIG_SAMPLE_LIVEPATCH=m, you can try it out (or with
one of your own livepatches):

# Convenience variable
  $ SYSFS=/sys/kernel/debug/tracing

# Install the livepatch sample demo module
  $ insmod samples/livepatch/livepatch-sample.ko

# Verify that ftrace can filter on our functions
  $ grep cmdline_proc_show $SYSFS/available_filter_functions
  cmdline_proc_show
  livepatch_cmdline_proc_show [livepatch_sample]

# Turn off any existing tracing and filter functions
  $ echo 0 > $SYSFS/tracing_on
  $ echo > $SYSFS/set_ftrace_filter

# Set up the function tracer and add the kernel's cmdline_proc_show()
# and livepatch-sample's livepatch_cmdline_proc_show()
  $ echo function > $SYSFS/current_tracer
  $ echo cmdline_proc_show >> $SYSFS/set_ftrace_filter
  $ echo livepatch_cmdline_proc_show >> $SYSFS/set_ftrace_filter
  $ cat $SYSFS/set_ftrace_filter
  cmdline_proc_show
  livepatch_cmdline_proc_show [livepatch_sample]

# Turn on the ftracing and force execution of the original and
# livepatched functions
  $ echo 1 > $SYSFS/tracing_on
  $ cat /proc/cmdline 
  this has been live patched

# Checkout out the trace file results
  $ cat $SYSFS/trace
  # tracer: function
  #
  # entries-in-buffer/entries-written: 2/2   #P:8
  #
  #_-=> irqs-off/BH-disabled
  #   / _=> need-resched
  #  | / _---=> hardirq/softirq
  #  || / _--=> preempt-depth
  #  ||| / _-=> migrate-disable
  #   / delay
  #   TASK-PID CPU#  |  TIMESTAMP  FUNCTION
  #  | | |   | | |
   cat-254 [002] ...2.   363.043498: cmdline_proc_show 
<-seq_read_iter
   cat-254 [002] ...1.   363.043501: 
livepatch_cmdline_proc_show <-seq_read_iter


The kernel docs provide a lot of explanation of the complete ftracing
interface.  It's pretty power stuff, though you may also go the other
direction and look into using the trace-cmd front end to simplify all of
the sysfs manipulation.  Julia Evans wrote a blog [1] a while back that
provides a some more examples.

[1] https://jvns.ca/blog/2017/03/19/getting-started-with-ftrace/

--
Joe




Re: [PATCH 0/3] tracing: Fix some selftest issues

2024-06-04 Thread Steven Rostedt
On Tue, 4 Jun 2024 23:18:29 +0900
Masami Hiramatsu (Google)  wrote:

> BTW, some these bootup tests can be ported on KUnit. Do you have a plan to
> use KUnit?

I haven't thought about that. If someone else wants to do it, I will
happily review it ;-)

-- Steve



Re: [PATCH v3 00/27] function_graph: Allow multiple users for function graph tracing

2024-06-04 Thread Google
On Tue, 4 Jun 2024 08:18:50 -0400
Steven Rostedt  wrote:

> 
> Masami,
> 
> This series passed all my tests, are you comfortable with me pushing
> them to linux-next?

Yes, this series looks good to me too.

Reviewed-by: Masami Hiramatsu (Google) 

for the series.

Thank you!

> 
> -- Steve
> 
> 
> On Mon, 03 Jun 2024 15:07:04 -0400
> Steven Rostedt  wrote:
> 
> > This is a continuation of the function graph multi user code.
> > I wrote a proof of concept back in 2019 of this code[1] and
> > Masami started cleaning it up. I started from Masami's work v10
> > that can be found here:
> > 
> >  
> > https://lore.kernel.org/linux-trace-kernel/171509088006.162236.7227326999861366050.stgit@devnote2/
> > 
> > This is *only* the code that allows multiple users of function
> > graph tracing. This is not the fprobe work that Masami is working
> > to add on top of it. As Masami took my proof of concept, there
> > was still several things I disliked about that code. Instead of
> > having Masami clean it up even more, I decided to take over on just
> > my code and change it up a bit.
> > 
> > Changes since v2: 
> > https://lore.kernel.org/linux-trace-kernel/20240602033744.563858...@goodmis.org
> > 
> > - Added comments describing which hashes the append and intersect
> >   functions were used for.
> > 
> > - Replaced checks of (NULL or EMPTY_HASH) with ftrace_hash_empty()
> >   helper function.
> > 
> > - Added check at the end of intersect_hash() to convert the hash
> >   to EMPTY hash if it doesn't have any functions.
> > 
> > - Renamed compare_ops() to ops_equal() and return boolean (inversed return
> >   value).
> > 
> > - Broke out __ftrace_hash_move_and_update_ops() to use in both
> >   ftrace_hash_move_and_update_ops() and 
> > ftrace_hash_move_and_update_subops().
> > 
> > Diff between last version at end of this email.
> > 
> > Masami Hiramatsu (Google) (3):
> >   function_graph: Handle tail calls for stack unwinding
> >   function_graph: Use a simple LRU for fgraph_array index number
> >   ftrace: Add multiple fgraph storage selftest
> > 
> > Steven Rostedt (Google) (9):
> >   ftrace: Add subops logic to allow one ops to manage many
> >   ftrace: Allow subops filtering to be modified
> >   function_graph: Add pid tracing back to function graph tracer
> >   function_graph: Use for_each_set_bit() in __ftrace_return_to_handler()
> >   function_graph: Use bitmask to loop on fgraph entry
> >   function_graph: Use static_call and branch to optimize entry function
> >   function_graph: Use static_call and branch to optimize return function
> >   selftests/ftrace: Add function_graph tracer to func-filter-pid test
> >   selftests/ftrace: Add fgraph-multi.tc test
> > 
> > Steven Rostedt (VMware) (15):
> >   function_graph: Convert ret_stack to a series of longs
> >   fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible 
> > by long
> >   function_graph: Add an array structure that will allow multiple 
> > callbacks
> >   function_graph: Allow multiple users to attach to function graph
> >   function_graph: Remove logic around ftrace_graph_entry and return
> >   ftrace/function_graph: Pass fgraph_ops to function graph callbacks
> >   ftrace: Allow function_graph tracer to be enabled in instances
> >   ftrace: Allow ftrace startup flags to exist without dynamic ftrace
> >   function_graph: Have the instances use their own ftrace_ops for 
> > filtering
> >   function_graph: Add "task variables" per task for fgraph_ops
> >   function_graph: Move set_graph_function tests to shadow stack global 
> > var
> >   function_graph: Move graph depth stored data to shadow stack global 
> > var
> >   function_graph: Move graph notrace bit to shadow stack global var
> >   function_graph: Implement fgraph_reserve_data() and 
> > fgraph_retrieve_data()
> >   function_graph: Add selftest for passing local variables
> > 
> > 
> >  include/linux/ftrace.h |   43 +-
> >  include/linux/sched.h  |2 +-
> >  include/linux/trace_recursion.h|   39 -
> >  kernel/trace/fgraph.c  | 1044 
> > 
> >  kernel/trace/ftrace.c  |  522 +-
> >  kernel/trace/ftrace_internal.h |5 +-
> >  kernel/trace/trace.h   |   94 +-
> >  kernel/trace/trace_functions.c |8 +
> >  kernel/trace/trace_functions_graph.c   |   96 +-
> >  kernel/trace/trace_irqsoff.c   |   10 +-
> >  kernel/trace/trace_sched_wakeup.c  |   10 +-
> >  kernel/trace/trace_selftest.c  |  259 -
> >  .../selftests/ftrace/test.d/ftrace/fgraph-multi.tc |  103 ++
> >  .../ftrace/test.d/ftrace/func-filter-pid.tc|   27 +-
> >  14 files changed, 1945 insertions(+), 317 

Re: [PATCH 0/3] tracing: Fix some selftest issues

2024-06-04 Thread Google
On Tue, 4 Jun 2024 09:57:46 -0400
Steven Rostedt  wrote:

> On Fri, 31 May 2024 23:20:47 +0900
> Masami Hiramatsu (Google)  wrote:
> 
> > The major conflict happens when the boot-time test cleans up the kprobe
> > events by
> > 
> >   dyn_events_release_all(_kprobe_ops);
> > 
> > And I removed it by [3/3] patch in this series :) because it does not
> > needed and not confirmed there is no other kprobe events when the test
> > starts. Also the warning message are redundant so I removed it by [2/3].
> > 
> > So without this [1/3], if we apply [2/3] and [3/3], the problem will be
> > mitigated, but I think the root cause is that these modules are built-in.
> 
> I'm OK with making them module only, but I don't see any selftests for
> sythetic events. I think they should have a boot up test as well. If we
> remove them, let's add something to test them at boot up. Then the boot up
> code could clean it up.
> 
> Or change the test module to be a boot up test that cleans itself up if it
> is compiled in as not a module?

Yeah, I think we may need another test code for synthetic events, which
also triggering the synthetic events.

BTW, some these bootup tests can be ported on KUnit. Do you have a plan to
use KUnit?

Thank you,

> 
> -- Steve
> 


-- 
Masami Hiramatsu (Google) 



Re: [PATCH v2 2/4] perf,uprobes: fix user stack traces in the presence of pending uretprobes

2024-06-04 Thread Google
On Tue, 21 May 2024 18:38:43 -0700
Andrii Nakryiko  wrote:

> When kernel has pending uretprobes installed, it hijacks original user
> function return address on the stack with a uretprobe trampoline
> address. There could be multiple such pending uretprobes (either on
> different user functions or on the same recursive one) at any given
> time within the same task.
> 
> This approach interferes with the user stack trace capture logic, which
> would report suprising addresses (like 0x7fffe000) that correspond
> to a special "[uprobes]" section that kernel installs in the target
> process address space for uretprobe trampoline code, while logically it
> should be an address somewhere within the calling function of another
> traced user function.
> 
> This is easy to correct for, though. Uprobes subsystem keeps track of
> pending uretprobes and records original return addresses. This patch is
> using this to do a post-processing step and restore each trampoline
> address entries with correct original return address. This is done only
> if there are pending uretprobes for current task.
> 
> This is a similar approach to what fprobe/kretprobe infrastructure is
> doing when capturing kernel stack traces in the presence of pending
> return probes.
> 

This looks good to me because this trampoline information is only
managed in uprobes. And it should be provided when unwinding user
stack.

Reviewed-by: Masami Hiramatsu (Google) 

Thank you!

> Reported-by: Riham Selim 
> Signed-off-by: Andrii Nakryiko 
> ---
>  kernel/events/callchain.c | 43 ++-
>  kernel/events/uprobes.c   |  9 
>  2 files changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
> index 1273be84392c..b17e3323f7f6 100644
> --- a/kernel/events/callchain.c
> +++ b/kernel/events/callchain.c
> @@ -11,6 +11,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "internal.h"
>  
> @@ -176,13 +177,51 @@ put_callchain_entry(int rctx)
>   put_recursion_context(this_cpu_ptr(callchain_recursion), rctx);
>  }
>  
> +static void fixup_uretprobe_trampoline_entries(struct perf_callchain_entry 
> *entry,
> +int start_entry_idx)
> +{
> +#ifdef CONFIG_UPROBES
> + struct uprobe_task *utask = current->utask;
> + struct return_instance *ri;
> + __u64 *cur_ip, *last_ip, tramp_addr;
> +
> + if (likely(!utask || !utask->return_instances))
> + return;
> +
> + cur_ip = >ip[start_entry_idx];
> + last_ip = >ip[entry->nr - 1];
> + ri = utask->return_instances;
> + tramp_addr = uprobe_get_trampoline_vaddr();
> +
> + /*
> +  * If there are pending uretprobes for the current thread, they are
> +  * recorded in a list inside utask->return_instances; each such
> +  * pending uretprobe replaces traced user function's return address on
> +  * the stack, so when stack trace is captured, instead of seeing
> +  * actual function's return address, we'll have one or many uretprobe
> +  * trampoline addresses in the stack trace, which are not helpful and
> +  * misleading to users.
> +  * So here we go over the pending list of uretprobes, and each
> +  * encountered trampoline address is replaced with actual return
> +  * address.
> +  */
> + while (ri && cur_ip <= last_ip) {
> + if (*cur_ip == tramp_addr) {
> + *cur_ip = ri->orig_ret_vaddr;
> + ri = ri->next;
> + }
> + cur_ip++;
> + }
> +#endif
> +}
> +
>  struct perf_callchain_entry *
>  get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
>  u32 max_stack, bool crosstask, bool add_mark)
>  {
>   struct perf_callchain_entry *entry;
>   struct perf_callchain_entry_ctx ctx;
> - int rctx;
> + int rctx, start_entry_idx;
>  
>   entry = get_callchain_entry();
>   if (!entry)
> @@ -215,7 +254,9 @@ get_perf_callchain(struct pt_regs *regs, u32 init_nr, 
> bool kernel, bool user,
>   if (add_mark)
>   perf_callchain_store_context(, 
> PERF_CONTEXT_USER);
>  
> + start_entry_idx = entry->nr;
>   perf_callchain_user(, regs);
> + fixup_uretprobe_trampoline_entries(entry, 
> start_entry_idx);
>   }
>   }
>  
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index d60d24f0f2f4..1c99380dc89d 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -2149,6 +2149,15 @@ static void handle_trampoline(struct pt_regs *regs)
>  
>   instruction_pointer_set(regs, ri->orig_ret_vaddr);
>   do {
> + /* pop current instance from the stack of pending 
> return instances,
> +  * as it's not pending anymore: we just fixed up 
> original
> +  

Re: [PATCH v2 3/4] perf,x86: avoid missing caller address in stack traces captured in uprobe

2024-06-04 Thread Google
On Tue, 21 May 2024 18:38:44 -0700
Andrii Nakryiko  wrote:

> When tracing user functions with uprobe functionality, it's common to
> install the probe (e.g., a BPF program) at the first instruction of the
> function. This is often going to be `push %rbp` instruction in function
> preamble, which means that within that function frame pointer hasn't
> been established yet. This leads to consistently missing an actual
> caller of the traced function, because perf_callchain_user() only
> records current IP (capturing traced function) and then following frame
> pointer chain (which would be caller's frame, containing the address of
> caller's caller).

I thought this problem might be solved by sframe.

> 
> So when we have target_1 -> target_2 -> target_3 call chain and we are
> tracing an entry to target_3, captured stack trace will report
> target_1 -> target_3 call chain, which is wrong and confusing.
> 
> This patch proposes a x86-64-specific heuristic to detect `push %rbp`
> instruction being traced.

I like this kind of idea :) But I think this should be done in
the user-space, not in the kernel because it is not always sure
that the user program uses stack frames. 

> If that's the case, with the assumption that
> applicatoin is compiled with frame pointers, this instruction would be
> a strong indicator that this is the entry to the function. In that case,
> return address is still pointed to by %rsp, so we fetch it and add to
> stack trace before proceeding to unwind the rest using frame
> pointer-based logic.

Why don't we make it in the userspace BPF program? If it is done
in the user space, like perf-probe, I'm OK. But I doubt to do this in
kernel. That means it is not flexible.

More than anything, without user-space helper to find function
symbols, uprobe does not know the function entry. Then I'm curious
why don't you do this in the user space.

At least, this should be done in the user of uprobes, like trace_uprobe
or bpf.


Thank you,

> 
> Signed-off-by: Andrii Nakryiko 
> ---
>  arch/x86/events/core.c  | 20 
>  include/linux/uprobes.h |  2 ++
>  kernel/events/uprobes.c |  2 ++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 5b0dd07b1ef1..82d5570b58ff 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -2884,6 +2884,26 @@ perf_callchain_user(struct perf_callchain_entry_ctx 
> *entry, struct pt_regs *regs
>   return;
>  
>   pagefault_disable();
> +
> +#ifdef CONFIG_UPROBES
> + /*
> +  * If we are called from uprobe handler, and we are indeed at the very
> +  * entry to user function (which is normally a `push %rbp` instruction,
> +  * under assumption of application being compiled with frame pointers),
> +  * we should read return address from *regs->sp before proceeding
> +  * to follow frame pointers, otherwise we'll skip immediate caller
> +  * as %rbp is not yet setup.
> +  */
> + if (current->utask) {
> + struct arch_uprobe *auprobe = current->utask->auprobe;
> + u64 ret_addr;
> +
> + if (auprobe && auprobe->insn[0] == 0x55 /* push %rbp */ &&
> + !__get_user(ret_addr, (const u64 __user *)regs->sp))
> + perf_callchain_store(entry, ret_addr);
> + }
> +#endif
> +
>   while (entry->nr < entry->max_stack) {
>   if (!valid_user_frame(fp, sizeof(frame)))
>   break;
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index 0c57eec85339..7b785cd30d86 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -76,6 +76,8 @@ struct uprobe_task {
>   struct uprobe   *active_uprobe;
>   unsigned long   xol_vaddr;
>  
> + struct arch_uprobe  *auprobe;
> +
>   struct return_instance  *return_instances;
>   unsigned intdepth;
>  };
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 1c99380dc89d..504693845187 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -2072,6 +2072,7 @@ static void handler_chain(struct uprobe *uprobe, struct 
> pt_regs *regs)
>   bool need_prep = false; /* prepare return uprobe, when needed */
>  
>   down_read(>register_rwsem);
> + current->utask->auprobe = >arch;
>   for (uc = uprobe->consumers; uc; uc = uc->next) {
>   int rc = 0;
>  
> @@ -2086,6 +2087,7 @@ static void handler_chain(struct uprobe *uprobe, struct 
> pt_regs *regs)
>  
>   remove &= rc;
>   }
> + current->utask->auprobe = NULL;
>  
>   if (need_prep && !remove)
>   prepare_uretprobe(uprobe, regs); /* put bp at return */
> -- 
> 2.43.0
> 


-- 
Masami Hiramatsu (Google) 



Re: [PATCH 0/3] tracing: Fix some selftest issues

2024-06-04 Thread Steven Rostedt
On Fri, 31 May 2024 23:20:47 +0900
Masami Hiramatsu (Google)  wrote:

> The major conflict happens when the boot-time test cleans up the kprobe
> events by
> 
>   dyn_events_release_all(_kprobe_ops);
> 
> And I removed it by [3/3] patch in this series :) because it does not
> needed and not confirmed there is no other kprobe events when the test
> starts. Also the warning message are redundant so I removed it by [2/3].
> 
> So without this [1/3], if we apply [2/3] and [3/3], the problem will be
> mitigated, but I think the root cause is that these modules are built-in.

I'm OK with making them module only, but I don't see any selftests for
sythetic events. I think they should have a boot up test as well. If we
remove them, let's add something to test them at boot up. Then the boot up
code could clean it up.

Or change the test module to be a boot up test that cleans itself up if it
is compiled in as not a module?

-- Steve




Re: [PATCH v4 2/2] sched/rt, dl: Convert functions to return bool

2024-06-04 Thread Qais Yousef
On 06/03/24 08:33, Metin Kaya wrote:
> On 01/06/2024 10:33 pm, Qais Yousef wrote:
> > {rt, realtime, dl}_{task, prio}() functions return value is actually
> > a bool.  Convert their return type to reflect that.
> > 
> > Suggested-by: Steven Rostedt (Google) 
> > Signed-off-by: Qais Yousef 
> > ---
> >   include/linux/sched/deadline.h |  8 
> >   include/linux/sched/rt.h   | 16 
> >   2 files changed, 12 insertions(+), 12 deletions(-)
> > 
> > diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
> > index 5cb88b748ad6..f2053f46f1d5 100644
> > --- a/include/linux/sched/deadline.h
> > +++ b/include/linux/sched/deadline.h
> > @@ -10,18 +10,18 @@
> >   #include 
> > -static inline int dl_prio(int prio)
> > +static inline bool dl_prio(int prio)
> >   {
> > if (unlikely(prio < MAX_DL_PRIO))
> > -   return 1;
> > -   return 0;
> > +   return true;
> > +   return false;
> 
> Nit: `return unlikely(prio < MAX_DL_PRIO)` would be simpler.
> The same can be applied to rt_prio() and realtime_prio(). This would make
> {dl, rt, realtime}_task() single-liner. Maybe further simplification can be
> done.

Fair. Thanks.

> 
> >   }
> >   /*
> >* Returns true if a task has a priority that belongs to DL class. 
> > PI-boosted
> >* tasks will return true. Use dl_policy() to ignore PI-boosted tasks.
> >*/
> > -static inline int dl_task(struct task_struct *p)
> > +static inline bool dl_task(struct task_struct *p)
> >   {
> > return dl_prio(p->prio);
> >   }
> > diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
> > index a055dd68a77c..efbdd2e57765 100644
> > --- a/include/linux/sched/rt.h
> > +++ b/include/linux/sched/rt.h
> > @@ -6,25 +6,25 @@
> >   struct task_struct;
> > -static inline int rt_prio(int prio)
> > +static inline bool rt_prio(int prio)
> >   {
> > if (unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO))
> > -   return 1;
> > -   return 0;
> > +   return true;
> > +   return false;
> >   }
> > -static inline int realtime_prio(int prio)
> > +static inline bool realtime_prio(int prio)
> >   {
> > if (unlikely(prio < MAX_RT_PRIO))
> > -   return 1;
> > -   return 0;
> > +   return true;
> > +   return false;
> >   }
> >   /*
> >* Returns true if a task has a priority that belongs to RT class. 
> > PI-boosted
> >* tasks will return true. Use rt_policy() to ignore PI-boosted tasks.
> >*/
> > -static inline int rt_task(struct task_struct *p)
> > +static inline bool rt_task(struct task_struct *p)
> >   {
> > return rt_prio(p->prio);
> >   }
> > @@ -34,7 +34,7 @@ static inline int rt_task(struct task_struct *p)
> >* PI-boosted tasks will return true. Use realtime_task_policy() to ignore
> >* PI-boosted tasks.
> >*/
> > -static inline int realtime_task(struct task_struct *p)
> > +static inline bool realtime_task(struct task_struct *p)
> >   {
> > return realtime_prio(p->prio);
> >   }
> 



Re: [PATCH v3 00/27] function_graph: Allow multiple users for function graph tracing

2024-06-04 Thread Steven Rostedt


Masami,

This series passed all my tests, are you comfortable with me pushing
them to linux-next?

-- Steve


On Mon, 03 Jun 2024 15:07:04 -0400
Steven Rostedt  wrote:

> This is a continuation of the function graph multi user code.
> I wrote a proof of concept back in 2019 of this code[1] and
> Masami started cleaning it up. I started from Masami's work v10
> that can be found here:
> 
>  
> https://lore.kernel.org/linux-trace-kernel/171509088006.162236.7227326999861366050.stgit@devnote2/
> 
> This is *only* the code that allows multiple users of function
> graph tracing. This is not the fprobe work that Masami is working
> to add on top of it. As Masami took my proof of concept, there
> was still several things I disliked about that code. Instead of
> having Masami clean it up even more, I decided to take over on just
> my code and change it up a bit.
> 
> Changes since v2: 
> https://lore.kernel.org/linux-trace-kernel/20240602033744.563858...@goodmis.org
> 
> - Added comments describing which hashes the append and intersect
>   functions were used for.
> 
> - Replaced checks of (NULL or EMPTY_HASH) with ftrace_hash_empty()
>   helper function.
> 
> - Added check at the end of intersect_hash() to convert the hash
>   to EMPTY hash if it doesn't have any functions.
> 
> - Renamed compare_ops() to ops_equal() and return boolean (inversed return
>   value).
> 
> - Broke out __ftrace_hash_move_and_update_ops() to use in both
>   ftrace_hash_move_and_update_ops() and ftrace_hash_move_and_update_subops().
> 
> Diff between last version at end of this email.
> 
> Masami Hiramatsu (Google) (3):
>   function_graph: Handle tail calls for stack unwinding
>   function_graph: Use a simple LRU for fgraph_array index number
>   ftrace: Add multiple fgraph storage selftest
> 
> Steven Rostedt (Google) (9):
>   ftrace: Add subops logic to allow one ops to manage many
>   ftrace: Allow subops filtering to be modified
>   function_graph: Add pid tracing back to function graph tracer
>   function_graph: Use for_each_set_bit() in __ftrace_return_to_handler()
>   function_graph: Use bitmask to loop on fgraph entry
>   function_graph: Use static_call and branch to optimize entry function
>   function_graph: Use static_call and branch to optimize return function
>   selftests/ftrace: Add function_graph tracer to func-filter-pid test
>   selftests/ftrace: Add fgraph-multi.tc test
> 
> Steven Rostedt (VMware) (15):
>   function_graph: Convert ret_stack to a series of longs
>   fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by 
> long
>   function_graph: Add an array structure that will allow multiple 
> callbacks
>   function_graph: Allow multiple users to attach to function graph
>   function_graph: Remove logic around ftrace_graph_entry and return
>   ftrace/function_graph: Pass fgraph_ops to function graph callbacks
>   ftrace: Allow function_graph tracer to be enabled in instances
>   ftrace: Allow ftrace startup flags to exist without dynamic ftrace
>   function_graph: Have the instances use their own ftrace_ops for 
> filtering
>   function_graph: Add "task variables" per task for fgraph_ops
>   function_graph: Move set_graph_function tests to shadow stack global var
>   function_graph: Move graph depth stored data to shadow stack global var
>   function_graph: Move graph notrace bit to shadow stack global var
>   function_graph: Implement fgraph_reserve_data() and 
> fgraph_retrieve_data()
>   function_graph: Add selftest for passing local variables
> 
> 
>  include/linux/ftrace.h |   43 +-
>  include/linux/sched.h  |2 +-
>  include/linux/trace_recursion.h|   39 -
>  kernel/trace/fgraph.c  | 1044 
> 
>  kernel/trace/ftrace.c  |  522 +-
>  kernel/trace/ftrace_internal.h |5 +-
>  kernel/trace/trace.h   |   94 +-
>  kernel/trace/trace_functions.c |8 +
>  kernel/trace/trace_functions_graph.c   |   96 +-
>  kernel/trace/trace_irqsoff.c   |   10 +-
>  kernel/trace/trace_sched_wakeup.c  |   10 +-
>  kernel/trace/trace_selftest.c  |  259 -
>  .../selftests/ftrace/test.d/ftrace/fgraph-multi.tc |  103 ++
>  .../ftrace/test.d/ftrace/func-filter-pid.tc|   27 +-
>  14 files changed, 1945 insertions(+), 317 deletions(-)
>  create mode 100644 
> tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> 
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 41fabc6d30e4..da7e6abf48b4 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -3170,7 +3170,7 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
>  /* Simply make a copy of @src and 

Re: [PATCH 2/2] ARM: dts: qcom: Add initial support for HTC One (M8)

2024-06-04 Thread Konrad Dybcio




On 6/3/24 08:28, Alexandre Messier via B4 Relay wrote:

From: Alexandre Messier 

Add initial device tree for the HTC One (M8) smartphone.

Initial support includes:
  - eMMC
  - Power button
  - USB
  - Vibrator
  - Volume buttons (GPIO)
  - Wi-Fi

Signed-off-by: Alexandre Messier 
---


Reviewed-by: Konrad Dybcio 

Konrad



Re: [PATCH] livepatch: introduce klp_func called interface

2024-06-04 Thread zhang warden


> My intention to introduce this attitude to sysfs is that user who what to see 
> if this function is called can just need to show this function attribute in 
> the livepatch sysfs interface.
> 

Sorry bros,
There is a typo in my word : attitude -> attribute 

Autocomplete make it wrong….lol..

Wardenjohn


Re: [PATCH v2 4/4] selftests/bpf: add test validating uprobe/uretprobe stack traces

2024-06-04 Thread Jiri Olsa
On Tue, May 21, 2024 at 06:38:45PM -0700, Andrii Nakryiko wrote:
> Add a set of tests to validate that stack traces captured from or in the
> presence of active uprobes and uretprobes are valid and complete.
> 
> For this we use BPF program that are installed either on entry or exit
> of user function, plus deep-nested USDT. One of target funtions
> (target_1) is recursive to generate two different entries in the stack
> trace for the same uprobe/uretprobe, testing potential edge conditions.
> 
> Without fixes in this patch set, we get something like this for one of
> the scenarios:
> 
>  caller: 0x758fff - 0x7595ab
>  target_1: 0x758fd5 - 0x758fff
>  target_2: 0x758fca - 0x758fd5
>  target_3: 0x758fbf - 0x758fca
>  target_4: 0x758fb3 - 0x758fbf
>  ENTRY #0: 0x758fb3 (in target_4)
>  ENTRY #1: 0x758fd3 (in target_2)
>  ENTRY #2: 0x758ffd (in target_1)
>  ENTRY #3: 0x7fffe000
>  ENTRY #4: 0x7fffe000
>  ENTRY #5: 0x6f8f39
>  ENTRY #6: 0x6fa6f0
>  ENTRY #7: 0x7f403f229590
> 
> Entry #3 and #4 (0x7fffe000) are uretprobe trampoline addresses
> which obscure actual target_1 and another target_1 invocations. Also
> note that between entry #0 and entry #1 we are missing an entry for
> target_3, which is fixed in patch #2.
> 
> With all the fixes, we get desired full stack traces:
> 
>  caller: 0x758fff - 0x7595ab
>  target_1: 0x758fd5 - 0x758fff
>  target_2: 0x758fca - 0x758fd5
>  target_3: 0x758fbf - 0x758fca
>  target_4: 0x758fb3 - 0x758fbf
>  ENTRY #0: 0x758fb7 (in target_4)
>  ENTRY #1: 0x758fc8 (in target_3)
>  ENTRY #2: 0x758fd3 (in target_2)
>  ENTRY #3: 0x758ffd (in target_1)
>  ENTRY #4: 0x758ff3 (in target_1)
>  ENTRY #5: 0x75922c (in caller)
>  ENTRY #6: 0x6f8f39
>  ENTRY #7: 0x6fa6f0
>  ENTRY #8: 0x7f986adc4cd0
> 
> Now there is a logical and complete sequence of function calls.
> 
> Signed-off-by: Andrii Nakryiko 

Acked-by: Jiri Olsa 

jirka

> ---
>  .../bpf/prog_tests/uretprobe_stack.c  | 186 ++
>  .../selftests/bpf/progs/uretprobe_stack.c |  96 +
>  2 files changed, 282 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/uretprobe_stack.c
>  create mode 100644 tools/testing/selftests/bpf/progs/uretprobe_stack.c
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/uretprobe_stack.c 
> b/tools/testing/selftests/bpf/prog_tests/uretprobe_stack.c
> new file mode 100644
> index ..6deb8d560ddd
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/uretprobe_stack.c
> @@ -0,0 +1,186 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
> +
> +#include 
> +#include "uretprobe_stack.skel.h"
> +#include "../sdt.h"
> +
> +/* We set up target_1() -> target_2() -> target_3() -> target_4() -> USDT()
> + * call chain, each being traced by our BPF program. On entry or return from
> + * each target_*() we are capturing user stack trace and recording it in
> + * global variable, so that user space part of the test can validate it.
> + *
> + * Note, we put each target function into a custom section to get those
> + * __start_XXX/__stop_XXX symbols, generated by linker for us, which allow us
> + * to know address range of those functions
> + */
> +__attribute__((section("uprobe__target_4")))
> +__weak int target_4(void)
> +{
> + STAP_PROBE1(uretprobe_stack, target, 42);
> + return 42;
> +}
> +
> +extern const void *__start_uprobe__target_4;
> +extern const void *__stop_uprobe__target_4;
> +
> +__attribute__((section("uprobe__target_3")))
> +__weak int target_3(void)
> +{
> + return target_4();
> +}
> +
> +extern const void *__start_uprobe__target_3;
> +extern const void *__stop_uprobe__target_3;
> +
> +__attribute__((section("uprobe__target_2")))
> +__weak int target_2(void)
> +{
> + return target_3();
> +}
> +
> +extern const void *__start_uprobe__target_2;
> +extern const void *__stop_uprobe__target_2;
> +
> +__attribute__((section("uprobe__target_1")))
> +__weak int target_1(int depth)
> +{
> + if (depth < 1)
> + return 1 + target_1(depth + 1);
> + else
> + return target_2();
> +}
> +
> +extern const void *__start_uprobe__target_1;
> +extern const void *__stop_uprobe__target_1;
> +
> +extern const void *__start_uretprobe_stack_sec;
> +extern const void *__stop_uretprobe_stack_sec;
> +
> +struct range {
> + long start;
> + long stop;
> +};
> +
> +static struct range targets[] = {
> + {}, /* we want target_1 to map to target[1], so need 1-based indexing */
> + { (long)&__start_uprobe__target_1, (long)&__stop_uprobe__target_1 },
> + { (long)&__start_uprobe__target_2, (long)&__stop_uprobe__target_2 },
> + { (long)&__start_uprobe__target_3, (long)&__stop_uprobe__target_3 },
> + { (long)&__start_uprobe__target_4, (long)&__stop_uprobe__target_4 },
> +};
> +
> +static struct range caller = {
> + (long)&__start_uretprobe_stack_sec,
> + (long)&__stop_uretprobe_stack_sec,
> +};
> +
> +static void 

Re: [PATCH v2 3/4] perf,x86: avoid missing caller address in stack traces captured in uprobe

2024-06-04 Thread Jiri Olsa
On Tue, May 21, 2024 at 06:38:44PM -0700, Andrii Nakryiko wrote:
> When tracing user functions with uprobe functionality, it's common to
> install the probe (e.g., a BPF program) at the first instruction of the
> function. This is often going to be `push %rbp` instruction in function
> preamble, which means that within that function frame pointer hasn't
> been established yet. This leads to consistently missing an actual
> caller of the traced function, because perf_callchain_user() only
> records current IP (capturing traced function) and then following frame
> pointer chain (which would be caller's frame, containing the address of
> caller's caller).
> 
> So when we have target_1 -> target_2 -> target_3 call chain and we are
> tracing an entry to target_3, captured stack trace will report
> target_1 -> target_3 call chain, which is wrong and confusing.
> 
> This patch proposes a x86-64-specific heuristic to detect `push %rbp`
> instruction being traced. If that's the case, with the assumption that
> applicatoin is compiled with frame pointers, this instruction would be
> a strong indicator that this is the entry to the function. In that case,
> return address is still pointed to by %rsp, so we fetch it and add to
> stack trace before proceeding to unwind the rest using frame
> pointer-based logic.
> 
> Signed-off-by: Andrii Nakryiko 
> ---
>  arch/x86/events/core.c  | 20 
>  include/linux/uprobes.h |  2 ++
>  kernel/events/uprobes.c |  2 ++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 5b0dd07b1ef1..82d5570b58ff 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -2884,6 +2884,26 @@ perf_callchain_user(struct perf_callchain_entry_ctx 
> *entry, struct pt_regs *regs
>   return;
>  
>   pagefault_disable();
> +
> +#ifdef CONFIG_UPROBES
> + /*
> +  * If we are called from uprobe handler, and we are indeed at the very
> +  * entry to user function (which is normally a `push %rbp` instruction,
> +  * under assumption of application being compiled with frame pointers),
> +  * we should read return address from *regs->sp before proceeding
> +  * to follow frame pointers, otherwise we'll skip immediate caller
> +  * as %rbp is not yet setup.
> +  */
> + if (current->utask) {
> + struct arch_uprobe *auprobe = current->utask->auprobe;
> + u64 ret_addr;
> +
> + if (auprobe && auprobe->insn[0] == 0x55 /* push %rbp */ &&
> + !__get_user(ret_addr, (const u64 __user *)regs->sp))
> + perf_callchain_store(entry, ret_addr);
> + }
> +#endif
> +
>   while (entry->nr < entry->max_stack) {
>   if (!valid_user_frame(fp, sizeof(frame)))
>   break;
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index 0c57eec85339..7b785cd30d86 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -76,6 +76,8 @@ struct uprobe_task {
>   struct uprobe   *active_uprobe;
>   unsigned long   xol_vaddr;
>  
> + struct arch_uprobe  *auprobe;

I wonder we could use active_uprobe for this?

jirka

> +
>   struct return_instance  *return_instances;
>   unsigned intdepth;
>  };
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 1c99380dc89d..504693845187 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -2072,6 +2072,7 @@ static void handler_chain(struct uprobe *uprobe, struct 
> pt_regs *regs)
>   bool need_prep = false; /* prepare return uprobe, when needed */
>  
>   down_read(>register_rwsem);
> + current->utask->auprobe = >arch;
>   for (uc = uprobe->consumers; uc; uc = uc->next) {
>   int rc = 0;
>  
> @@ -2086,6 +2087,7 @@ static void handler_chain(struct uprobe *uprobe, struct 
> pt_regs *regs)
>  
>   remove &= rc;
>   }
> + current->utask->auprobe = NULL;
>  
>   if (need_prep && !remove)
>   prepare_uretprobe(uprobe, regs); /* put bp at return */
> -- 
> 2.43.0
> 
> 



Re: [PATCH] livepatch: introduce klp_func called interface

2024-06-04 Thread zhang warden



> On May 31, 2024, at 22:06, Miroslav Benes  wrote:
> 
>> And for the unlikely branch, isn’t the complier will compile this branch 
>> into a cold branch that will do no harm to the function performance?
> 
> The test (cmp insn or something like that) still needs to be there. Since 
> there is only a simple assignment in the branch, the compiler may just 
> choose not to have a cold branch in this case. The only difference is in 
> which case you would jump here. You can see for yourself (and prove me 
> wrong if it comes to it).
> 
> Miroslav

Hi Miroslav,

Yes, more tests should be done in this case according to your opinion.

Regards,
Wardenjohn





Re: [PATCH] livepatch: introduce klp_func called interface

2024-06-04 Thread zhang warden



> On Jun 1, 2024, at 03:16, Joe Lawrence  wrote:
> 
> Adding these attributes to livepatch sysfs would be expedient and
> probably easier for us to use, but imposes a recurring burden on us to
> maintain and test (where is the documentation and kselftest for this new
> interface?).  Or, we could let the other tools handle all of that for
> us.
How this attribute imposes a recurring burden to maintain and test?

> Perhaps if someone already has an off-the-shelf script that is using
> ftrace to monitor livepatched code, it could be donated to
> Documentation/livepatch/?  I can ask our QE folks if they have something
> like this.

My intention to introduce this attitude to sysfs is that user who what to see 
if this function is called can just need to show this function attribute in the 
livepatch sysfs interface.

User who have no experience of using ftrace will have problems to get the 
calling state of the patched function. After all, ftrace is a professional 
kernel tracing tools.

Adding this attribute will be more easier for us to show if this patched 
function is called. Actually, I have never try to use ftrace to trace a patched 
function. Is it OK of using ftrace for a livepatched function?

Regards,
Wardenjohn






Re: [RESEND PATCH v5 3/7] dt-bindings: remoteproc: Add processor identifier property

2024-06-04 Thread Arnaud POULIQUEN
Hello Rob

On 6/3/24 16:35, Rob Herring wrote:
> On Tue, May 21, 2024 at 02:24:54PM +0200, Arnaud Pouliquen wrote:
>> Add the "st,proc-id" property allowing to identify the remote processor.
>> This ID is used to define an unique ID, common between Linux, U-boot and
>> OP-TEE to identify a coprocessor.
>> This ID will be used in request to OP-TEE remoteproc Trusted Application
>> to specify the remote processor.
>>
>> Signed-off-by: Arnaud Pouliquen 
>> ---
>>  .../devicetree/bindings/remoteproc/st,stm32-rproc.yaml | 7 +++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git 
>> a/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml 
>> b/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml
>> index 36ea54016b76..409123cd4667 100644
>> --- a/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml
>> +++ b/Documentation/devicetree/bindings/remoteproc/st,stm32-rproc.yaml
>> @@ -48,6 +48,10 @@ properties:
>>- description: The offset of the hold boot setting register
>>- description: The field mask of the hold boot
>>  
>> +  st,proc-id:
>> +description: remote processor identifier
>> +$ref: /schemas/types.yaml#/definitions/uint32
>> +
>>st,syscfg-tz:
>>  deprecated: true
>>  description:
>> @@ -182,6 +186,8 @@ allOf:
>>  st,syscfg-holdboot: false
>>  reset-names: false
>>  resets: false
>> +  required:
>> +- st,proc-id
> 
> New required properties are an ABI break. If that is okay, explain why 
> in the commit message.

This commit is the complement the patch 2/7. the require property is associated
to a new compatible. I created this commit as you already reviewed the 2/7

Perhaps It might be clearer if I merge the two.

Thanks,
Arnaud

> 
>>  
>>  additionalProperties: false
>>  
>> @@ -220,6 +226,7 @@ examples:
>>reg = <0x1000 0x4>,
>>  <0x3000 0x4>,
>>  <0x3800 0x1>;
>> +  st,proc-id = <0>;
>>st,syscfg-rsc-tbl = < 0x144 0x>;
>>st,syscfg-m4-state = < 0x148 0x>;
>>  };
>> -- 
>> 2.25.1
>>