date:20231208

Re: [PATCH] nvdimm-btt: fix a potential memleak in btt_freelist_init

2023-12-08 Thread Ira Weiny

dinghao.liu@ wrote:
> > Dave Jiang wrote:

[snip]

> > That said, this patch does not completely fix freelist from leaking in the
> > following error path.
> > 
> > discover_arenas()
> > btt_freelist_init() -> ok (memory allocated)
> > btt_rtt_init() -> fail
> > goto out;
> > (leak because arena is not yet on btt->arena_list)
> > OR
> > btt_maplocks_init() -> fail
> > goto out;
> > (leak because arena is not yet on btt->arena_list)
> > 
> 
> Thanks for pointing out this issue! I rechecked discover_arenas() and found
> that btt_rtt_init() may also trigger a memleak for the same reason as
> btt_freelist_init(). Also, I checked another call trace:
> 
> btt_init() -> btt_meta_init() -> btt_maplocks_init()
> 
> I think there is a memleak if btt_maplocks_init() succeeds but an error
> happens in btt_init() after btt_meta_init() (e.g., when btt_blk_init()
> returns an error). Therefore, we may need to fix three functions.

Yea I think we need to trace this code better.  This is why devm_ is nice for
memory allocated for the life of the device.

> 
> > This error could be fixed by adding to arena_list earlier but devm_*()
> > also takes care of this without having to worry about that logic.
> > 
> > On normal operation all of this memory can be free'ed with the
> > corresponding devm_kfree() and/or devm_add_action_*() calls if arenas come
> > and go.  I'm not sure off the top of my head.
> > 
> > In addition, looking at this code.  discover_arenas() could make use of
> > the scoped based management for struct btt_sb *super!
> > 
> > Dinghao would you be willing to submit a series of 2 or 3 patches to fix
> > the above issues?
> > 
> 
> Sure. Currently I plan to send 2 patches as follows:
> 1. Using devm_kcalloc() to replace kcalloc() in btt_freelist_init(), 
>btt_rtt_init(), and btt_maplocks_init(), and removing the corresponding
>kfree in free_arenas(). I checked some uses of devm_kcalloc() and it
>seems that we need not to call devm_kfree(). The memory is automatically
>freed on driver detach, right?

On device put yes.  So if these allocations are scoped to the life of the
device there would be no reason to call devm_kfree() on them at all.  I was not
sure if they got reallocated at some point or not.

> 2. Using the scoped based management for struct btt_sb *super (not a bug,
>but it could improve the code).

Good!

> 
> I'm not quite sure whether my understanding or bug fixing plan is correct.
> If there are any issues, please correct me, thanks!

The above sounds right.
Ira

Re: [PATCH 2/6] module: add CONFIG_BUILTIN_RANGES option

2023-12-08 Thread Google

On Fri,  8 Dec 2023 00:07:48 -0500
Kris Van Hees  wrote:

> The CONFIG_BUILTIN_RANGES option controls whether offset range data is
> generated for kernel modules that are built into the kernel image.
> 
> Signed-off-by: Kris Van Hees 
> Reviewed-by: Nick Alcock 
> Reviewed-by: Alan Maguire 
> ---
>  kernel/module/Kconfig | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig
> index 33a2e991f608..0798439b11ac 100644
> --- a/kernel/module/Kconfig
> +++ b/kernel/module/Kconfig
> @@ -389,4 +389,21 @@ config MODULES_TREE_LOOKUP
>   def_bool y
>   depends on PERF_EVENTS || TRACING || CFI_CLANG
>  
> +config BUILTIN_RANGES

BUILTIN_MODULE_RANGES ?
BTW, even if CONFIG_MODULES=n, we can embed the kernel module code.
So should this visible if the CONFIG_MODULES=n ?

Thank you,

> + bool "Generate address range information for builtin modules"
> + depends on VMLINUX_MAP
> + help
> +   When modules are built into the kernel, there will be no module name
> +   associated with its symbols in /proc/kallsyms.  Tracers may want to
> +   identify symbols by module name and symbol name regardless of whether
> +   the module is configured as loadable or not.
> +
> +   This option generates modules.builtin.ranges in the build tree with
> +   offset ranges (per ELF section) for the module(s) they belong to.
> +   It also records an anchor symbol to determine the load address of the
> +   section.
> +
> +   It is fully compatible with CONFIG_RANDOMIZE_BASE and similar late-
> +   address-modification options.
> +
>  endif # MODULES
> -- 
> 2.42.0
> 


-- 
Masami Hiramatsu (Google)

Re: [PATCH v3 5/5] input/touchscreen: imagis: add support for IST3032C

2023-12-08 Thread Karel Balej

Markuss,

thank you for the review.

> > diff --git a/drivers/input/touchscreen/imagis.c 
> > b/drivers/input/touchscreen/imagis.c
> > index 84a02672ac47..41f28e6e9cb1 100644
> > --- a/drivers/input/touchscreen/imagis.c
> > +++ b/drivers/input/touchscreen/imagis.c
> > @@ -35,6 +35,8 @@
> >   #define IST3038B_REG_CHIPID   0x30
> >   #define IST3038B_WHOAMI   0x30380b
> >   
> > +#define IST3032C_WHOAMI0x32c
> > +

> Perhaps it should be ordered in alphabetic/alphanumeric order, 
> alternatively, the chip ID values could be grouped.

Here I followed suit and just started a new section for the new chip,
except there is only one entry. I do agree that it would be better to
sort the chips alphanumerically and I am actually surprised that I
didn't do that - but now I see that the chips that you added are not
sorted either, so it might be because of that.

I propose to definitely swap the order of the sections, putting 32C
first, then 38B and 38C at the end (from top to bottom). The chip ID
values could then still be grouped in a new section, but I think I would
actually prefer to keep them as parts of the respective sections as it
is now, although it is in no way a strong preference.

Please let me know whether you agree with this or have a different
preference. And if the former, please confirm that I can add your
Reviewed-by trailer to the patch modified in such a way.

Best regards,
K. B.

Re: [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior

2023-12-08 Thread Verma, Vishal L

On Fri, 2023-12-08 at 12:36 +0100, David Hildenbrand wrote:
> On 07.12.23 05:36, Vishal Verma wrote:
[..]
> > 
> > +
> > +What:  /sys/bus/dax/devices/daxX.Y/memmap_on_memory
> > +Date:  October, 2023
> > +KernelVersion: v6.8
> > +Contact:   nvd...@lists.linux.dev
> > +Description:
> > +   (RW) Control the memmap_on_memory setting if the dax device
> > +   were to be hotplugged as system memory. This determines 
> > whether
> > +   the 'altmap' for the hotplugged memory will be placed on the
> > +   device being hotplugged (memmap_on+memory=1) or if it will 
> > be
> > +   placed on regular memory (memmap_on_memory=0). This 
> > attribute
> > +   must be set before the device is handed over to the 'kmem'
> > +   driver (i.e.  hotplugged into system-ram).
> > 
> 
> Should we note the dependency on other factors as given in 
> mhp_supports_memmap_on_memory(), especially, the system-wide setting and 
> some weird kernel configurations?
> 
Yep good idea, I'll make a note of those for v3.

Re: [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior

2023-12-08 Thread Verma, Vishal L

On Fri, 2023-12-08 at 11:13 +0800, Huang, Ying wrote:
> Vishal Verma  writes:
> 
[..]
> > 
> > +
> > +static ssize_t memmap_on_memory_store(struct device *dev,
> > + struct device_attribute *attr,
> > + const char *buf, size_t len)
> > +{
> > +   struct dev_dax *dev_dax = to_dev_dax(dev);
> > +   struct dax_region *dax_region = dev_dax->region;
> > +   ssize_t rc;
> > +   bool val;
> > +
> > +   rc = kstrtobool(buf, );
> > +   if (rc)
> > +   return rc;
> > +
> > +   if (dev_dax->memmap_on_memory == val)
> > +   return len;
> > +
> > +   device_lock(dax_region->dev);
> > +   if (!dax_region->dev->driver) {
> 
> This still doesn't look right.  Can we check whether the current driver
> is kmem?  And only allow change if it's not kmem?

Ah yes I lost track of this todo between revisions when I split this
out. Let me fix that for the next revision.

Re: [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior

2023-12-08 Thread Verma, Vishal L

On Fri, 2023-12-08 at 09:20 +, Zhijian Li (Fujitsu) wrote:
> 
> 
> On 08/12/2023 03:25, Verma, Vishal L wrote:
> > On Thu, 2023-12-07 at 08:29 +, Zhijian Li (Fujitsu) wrote:
> > > Hi Vishal,
> > > 
> > > 
> > > On 07/12/2023 12:36, Vishal Verma wrote:
> > > > +
> > > > +What:  /sys/bus/dax/devices/daxX.Y/memmap_on_memory
> > > > +Date:  October, 2023
> > > > +KernelVersion: v6.8
> > > > +Contact:   nvd...@lists.linux.dev
> > > > +Description:
> > > > +   (RW) Control the memmap_on_memory setting if the dax 
> > > > device
> > > > +   were to be hotplugged as system memory. This determines 
> > > > whether
> > > > +   the 'altmap' for the hotplugged memory will be placed 
> > > > on the
> > > > +   device being hotplugged (memmap_on+memory=1) or if it 
> > > > will be
> > > 
> > > s/memmap_on+memory=1/memmap_on_memory=1
> > 
> > Thanks, will fix.
> > > 
> > > 
> > > I have a question here
> > > 
> > > What relationship about memmap_on_memory and 'ndctl-create-namespace
> > > -M' option which
> > > specifies where is the vmemmap backed memory.
> > > I'm confused that memmap_on_memory=1 and '-M dev' are the same for
> > > nvdimm devdax mode ?
> > > 
> > The main difference is that memory that comes from non-nvdimm sources,
> > such as hmem, or cxl, doesn't have a chance to set up the altmaps as
> > pmem can with '-M dev'. For these, memmap_on_memory does this as part
> > of the memory hotplug.
> 
> Thanks for your explanation.
> (I wrongly thought nvdimm.kmem was also controlled by '-M dev' before)
> 
> feel free add:
> Tested-by: Li Zhijian 

Thank you Zhijian!

trace_event names with more than NAME_MAX chars

2023-12-08 Thread Beau Belgrave

While developing some unrelated features I happened to create a
trace_event that was more than NAME_MAX (255) characters. When this
happened the creation worked, but tracefs would hang any task that tried
to list the directory of the trace_event or remove it.

I followed the code down to the reason being eventfs would call
simple_lookup(), and if it failed, it would still try to create the
dentry. In this case DCACHE_PAR_LOOKUP would get set and never cleared.
This caused d_wait_lookup() to loop forever, since that flag is used in
d_in_lookup().

Both tracefs and eventfs use simple_lookup() and it fails for
dentries that exceed NAME_MAX. Should we even allow trace_events to
be created that exceed this limit? Or should tracefs/eventfs allow
this but somehow represent these differently?

I have a fix that appears to work for myself, but unsure if there are
other locations (attached at the end of this mail).

Thanks,
-Beau

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index f8a594a50ae6..d2c06ba26db4 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -561,6 +561,8 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
if (strcmp(ei_child->name, name) != 0)
continue;
ret = simple_lookup(dir, dentry, flags);
+   if (IS_ERR(ret))
+   goto out;
create_dir_dentry(ei, ei_child, ei_dentry, true);
created = true;
break;
@@ -583,6 +585,8 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
if (r <= 0)
continue;
ret = simple_lookup(dir, dentry, flags);
+   if (IS_ERR(ret))
+   goto out;
create_file_dentry(ei, i, ei_dentry, name, mode, cdata,
   fops, true);
break;

Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)

2023-12-08 Thread Mike Christie

On 12/8/23 3:24 AM, Tobias Huschle wrote:
> On Thu, Dec 07, 2023 at 01:48:40AM -0500, Michael S. Tsirkin wrote:
>> On Thu, Dec 07, 2023 at 07:22:12AM +0100, Tobias Huschle wrote:
>>> 3. vhost looping endlessly, waiting for kworker to be scheduled
>>>
>>> I dug a little deeper on what the vhost is doing. I'm not an expert on
>>> virtio whatsoever, so these are just educated guesses that maybe
>>> someone can verify/correct. Please bear with me probably messing up 
>>> the terminology.
>>>
>>> - vhost is looping through available queues.
>>> - vhost wants to wake up a kworker to process a found queue.
>>> - kworker does something with that queue and terminates quickly.
>>>
>>> What I found by throwing in some very noisy trace statements was that,
>>> if the kworker is not woken up, the vhost just keeps looping accross
>>> all available queues (and seems to repeat itself). So it essentially
>>> relies on the scheduler to schedule the kworker fast enough. Otherwise
>>> it will just keep on looping until it is migrated off the CPU.
>>
>>
>> Normally it takes the buffers off the queue and is done with it.
>> I am guessing that at the same time guest is running on some other
>> CPU and keeps adding available buffers?
>>
> 
> It seems to do just that, there are multiple other vhost instances
> involved which might keep filling up thoses queues. 
> 
> Unfortunately, this makes the problematic vhost instance to stay on
> the CPU and prevents said kworker to get scheduled. The kworker is
> explicitly woken up by vhost, so it wants it to do something.
> 
> At this point it seems that there is an assumption about the scheduler
> in place which is no longer fulfilled by EEVDF. From the discussion so
> far, it seems like EEVDF does what is intended to do.
> 
> Shouldn't there be a more explicit mechanism in use that allows the
> kworker to be scheduled in favor of the vhost?
> 
> It is also concerning that the vhost seems cannot be preempted by the
> scheduler while executing that loop.
> 

Hey,

I recently noticed this change:

commit 05bfb338fa8dd40b008ce443e397fc374f6bd107
Author: Josh Poimboeuf 
Date:   Fri Feb 24 08:50:01 2023 -0800

vhost: Fix livepatch timeouts in vhost_worker()

We used to do:

while (1)
for each vhost work item in list
execute work item
if (need_resched())
schedule();

and after that patch we do:

while (1)
for each vhost work item in list
execute work item
cond_resched()


Would the need_resched check we used to have give you what
you wanted?

Re: [PATCH v2 23/33] s390/boot: Add the KMSAN runtime stub

2023-12-08 Thread Alexander Potapenko

On Tue, Nov 21, 2023 at 11:02 PM Ilya Leoshkevich  wrote:
>
> It should be possible to have inline functions in the s390 header
> files, which call kmsan_unpoison_memory(). The problem is that these
> header files might be included by the decompressor, which does not
> contain KMSAN runtime, causing linker errors.
>
> Not compiling these calls if __SANITIZE_MEMORY__ is not defined -
> either by changing kmsan-checks.h or at the call sites - may cause
> unintended side effects, since calling these functions from an
> uninstrumented code that is linked into the kernel is valid use case.
>
> One might want to explicitly distinguish between the kernel and the
> decompressor. Checking for a decompressor-specific #define is quite
> heavy-handed, and will have to be done at all call sites.
>
> A more generic approach is to provide a dummy kmsan_unpoison_memory()
> definition. This produces some runtime overhead, but only when building
> with CONFIG_KMSAN. The benefit is that it does not disturb the existing
> KMSAN build logic and call sites don't need to be changed.
>
> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko

Re: [PATCH v2 09/33] kmsan: Introduce kmsan_memmove_metadata()

2023-12-08 Thread Alexander Potapenko

On Tue, Nov 21, 2023 at 11:07 PM Ilya Leoshkevich  wrote:
>
> It is useful to manually copy metadata in order to describe the effects
> of memmove()-like logic in uninstrumented code or inline asm. Introduce
> kmsan_memmove_metadata() for this purpose.
>
> Signed-off-by: Ilya Leoshkevich 
> ---
>  include/linux/kmsan-checks.h | 14 ++
>  mm/kmsan/hooks.c | 11 +++
>  2 files changed, 25 insertions(+)
>
> diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h
> index c4cae333deec..5218973f0ad0 100644
> --- a/include/linux/kmsan-checks.h
> +++ b/include/linux/kmsan-checks.h
> @@ -61,6 +61,17 @@ void kmsan_check_memory(const void *address, size_t size);
>  void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy,
> size_t left);
>
> +/**
> + * kmsan_memmove_metadata() - Copy kernel memory range metadata.
> + * @dst: start of the destination kernel memory range.
> + * @src: start of the source kernel memory range.
> + * @n:   size of the memory ranges.
> + *
> + * KMSAN will treat the destination range as if its contents were memmove()d
> + * from the source range.
> + */
> +void kmsan_memmove_metadata(void *dst, const void *src, size_t n);

As noted in patch 18/33, I am pretty sure we shouldn't need this function.

Re: [PATCH v2 18/33] lib/string: Add KMSAN support to strlcpy() and strlcat()

2023-12-08 Thread Alexander Potapenko

On Tue, Nov 21, 2023 at 11:02 PM Ilya Leoshkevich  wrote:
>
> Currently KMSAN does not fully propagate metadata in strlcpy() and
> strlcat(), because they are built with -ffreestanding and call
> memcpy(). In this combination memcpy() calls are not instrumented.

Is this something specific to s390?

> Fix by copying the metadata manually. Add the __STDC_HOSTED__ #ifdef in
> case the code is compiled with different flags in the future.
>
> Signed-off-by: Ilya Leoshkevich 
> ---
>  lib/string.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/lib/string.c b/lib/string.c
> index be26623953d2..e83c6dd77ec6 100644
> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -111,6 +111,9 @@ size_t strlcpy(char *dest, const char *src, size_t size)
> if (size) {
> size_t len = (ret >= size) ? size - 1 : ret;
> __builtin_memcpy(dest, src, len);

On x86, I clearly see this __builtin_memcpy() being replaced with
__msan_memcpy().

Re: [PATCH v2 04/33] kmsan: Increase the maximum store size to 4096

2023-12-08 Thread Alexander Potapenko

On Tue, Nov 21, 2023 at 11:07 PM Ilya Leoshkevich  wrote:
>
> The inline assembly block in s390's chsc() stores that much.
>
> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko

Re: [PATCH v7 3/4] remoteproc: zynqmp: add pm domains support

2023-12-08 Thread Tanmay Shah



On 12/8/23 9:51 AM, Mathieu Poirier wrote:
> On Wed, Dec 06, 2023 at 12:06:45PM -0600, Tanmay Shah wrote:
> > 
> > On 12/6/23 9:43 AM, Mathieu Poirier wrote:
> > > On Fri, 1 Dec 2023 at 11:10, Tanmay Shah  wrote:
> > > >
> > > >
> > > > On 11/29/23 11:10 AM, Mathieu Poirier wrote:
> > > > > On Mon, Nov 27, 2023 at 10:33:05AM -0600, Tanmay Shah wrote:
> > > > > >
> > > > > > On 11/23/23 12:11 PM, Mathieu Poirier wrote:
> > > > > > > On Wed, Nov 22, 2023 at 03:00:36PM -0600, Tanmay Shah wrote:
> > > > > > > > Hi Mathieu,
> > > > > > > >
> > > > > > > > Please find my comments below.
> > > > > > > >
> > > > > > > > On 11/21/23 4:59 PM, Mathieu Poirier wrote:
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > On Fri, Nov 17, 2023 at 09:42:37AM -0800, Tanmay Shah wrote:
> > > > > > > > > > Use TCM pm domains extracted from device-tree
> > > > > > > > > > to power on/off TCM using general pm domain framework.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Tanmay Shah 
> > > > > > > > > > ---
> > > > > > > > > >
> > > > > > > > > > Changes in v7:
> > > > > > > > > >   - %s/pm_dev1/pm_dev_core0/r
> > > > > > > > > >   - %s/pm_dev_link1/pm_dev_core0_link/r
> > > > > > > > > >   - %s/pm_dev2/pm_dev_core1/r
> > > > > > > > > >   - %s/pm_dev_link2/pm_dev_core1_link/r
> > > > > > > > > >   - remove pm_domain_id check to move next patch
> > > > > > > > > >   - add comment about how 1st entry in pm domain list is 
> > > > > > > > > > used
> > > > > > > > > >   - fix loop when jump to fail_add_pm_domains loop
> > > > > > > > > >
> > > > > > > > > >  drivers/remoteproc/xlnx_r5_remoteproc.c | 215 
> > > > > > > > > > +++-
> > > > > > > > > >  1 file changed, 212 insertions(+), 3 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
> > > > > > > > > > b/drivers/remoteproc/xlnx_r5_remoteproc.c
> > > > > > > > > > index 4395edea9a64..22bccc5075a0 100644
> > > > > > > > > > --- a/drivers/remoteproc/xlnx_r5_remoteproc.c
> > > > > > > > > > +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
> > > > > > > > > > @@ -16,6 +16,7 @@
> > > > > > > > > >  #include 
> > > > > > > > > >  #include 
> > > > > > > > > >  #include 
> > > > > > > > > > +#include 
> > > > > > > > > >
> > > > > > > > > >  #include "remoteproc_internal.h"
> > > > > > > > > >
> > > > > > > > > > @@ -102,6 +103,12 @@ static const struct mem_bank_data 
> > > > > > > > > > zynqmp_tcm_banks_lockstep[] = {
> > > > > > > > > >   * @rproc: rproc handle
> > > > > > > > > >   * @pm_domain_id: RPU CPU power domain id
> > > > > > > > > >   * @ipi: pointer to mailbox information
> > > > > > > > > > + * @num_pm_dev: number of tcm pm domain devices for this 
> > > > > > > > > > core
> > > > > > > > > > + * @pm_dev_core0: pm domain virtual devices for power 
> > > > > > > > > > domain framework
> > > > > > > > > > + * @pm_dev_core0_link: pm domain device links after 
> > > > > > > > > > registration
> > > > > > > > > > + * @pm_dev_core1: used only in lockstep mode. second 
> > > > > > > > > > core's pm domain virtual devices
> > > > > > > > > > + * @pm_dev_core1_link: used only in lockstep mode. second 
> > > > > > > > > > core's pm device links after
> > > > > > > > > > + * registration
> > > > > > > > > >   */
> > > > > > > > > >  struct zynqmp_r5_core {
> > > > > > > > > > struct device *dev;
> > > > > > > > > > @@ -111,6 +118,11 @@ struct zynqmp_r5_core {
> > > > > > > > > > struct rproc *rproc;
> > > > > > > > > > u32 pm_domain_id;
> > > > > > > > > > struct mbox_info *ipi;
> > > > > > > > > > +   int num_pm_dev;
> > > > > > > > > > +   struct device **pm_dev_core0;
> > > > > > > > > > +   struct device_link **pm_dev_core0_link;
> > > > > > > > > > +   struct device **pm_dev_core1;
> > > > > > > > > > +   struct device_link **pm_dev_core1_link;
> > > > > > > > > >  };
> > > > > > > > > >
> > > > > > > > > >  /**
> > > > > > > > > > @@ -651,7 +663,8 @@ static int 
> > > > > > > > > > add_tcm_carveout_lockstep_mode(struct rproc *rproc)
> > > > > > > > > >  
> > > > > > > > > > ZYNQMP_PM_CAPABILITY_ACCESS, 0,
> > > > > > > > > >  
> > > > > > > > > > ZYNQMP_PM_REQUEST_ACK_BLOCKING);
> > > > > > > > > > if (ret < 0) {
> > > > > > > > > > -   dev_err(dev, "failed to turn on TCM 
> > > > > > > > > > 0x%x", pm_domain_id);
> > > > > > > > > > +   dev_err(dev, "failed to turn on TCM 
> > > > > > > > > > 0x%x",
> > > > > > > > > > +   pm_domain_id);
> > > > > > > > >
> > > > > > > > > Spurious change, you should have caught that.
> > > > > > > >
> > > > > > > > Ack, need to observe changes more closely before sending them.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > goto release_tcm_lockstep;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > @@ -758,6 +771,189 @@ static int

Re: [PATCH v7 3/4] remoteproc: zynqmp: add pm domains support

2023-12-08 Thread Mathieu Poirier

On Wed, Dec 06, 2023 at 12:06:45PM -0600, Tanmay Shah wrote:
> 
> On 12/6/23 9:43 AM, Mathieu Poirier wrote:
> > On Fri, 1 Dec 2023 at 11:10, Tanmay Shah  wrote:
> > >
> > >
> > > On 11/29/23 11:10 AM, Mathieu Poirier wrote:
> > > > On Mon, Nov 27, 2023 at 10:33:05AM -0600, Tanmay Shah wrote:
> > > > >
> > > > > On 11/23/23 12:11 PM, Mathieu Poirier wrote:
> > > > > > On Wed, Nov 22, 2023 at 03:00:36PM -0600, Tanmay Shah wrote:
> > > > > > > Hi Mathieu,
> > > > > > >
> > > > > > > Please find my comments below.
> > > > > > >
> > > > > > > On 11/21/23 4:59 PM, Mathieu Poirier wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > On Fri, Nov 17, 2023 at 09:42:37AM -0800, Tanmay Shah wrote:
> > > > > > > > > Use TCM pm domains extracted from device-tree
> > > > > > > > > to power on/off TCM using general pm domain framework.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Tanmay Shah 
> > > > > > > > > ---
> > > > > > > > >
> > > > > > > > > Changes in v7:
> > > > > > > > >   - %s/pm_dev1/pm_dev_core0/r
> > > > > > > > >   - %s/pm_dev_link1/pm_dev_core0_link/r
> > > > > > > > >   - %s/pm_dev2/pm_dev_core1/r
> > > > > > > > >   - %s/pm_dev_link2/pm_dev_core1_link/r
> > > > > > > > >   - remove pm_domain_id check to move next patch
> > > > > > > > >   - add comment about how 1st entry in pm domain list is used
> > > > > > > > >   - fix loop when jump to fail_add_pm_domains loop
> > > > > > > > >
> > > > > > > > >  drivers/remoteproc/xlnx_r5_remoteproc.c | 215 
> > > > > > > > > +++-
> > > > > > > > >  1 file changed, 212 insertions(+), 3 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
> > > > > > > > > b/drivers/remoteproc/xlnx_r5_remoteproc.c
> > > > > > > > > index 4395edea9a64..22bccc5075a0 100644
> > > > > > > > > --- a/drivers/remoteproc/xlnx_r5_remoteproc.c
> > > > > > > > > +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
> > > > > > > > > @@ -16,6 +16,7 @@
> > > > > > > > >  #include 
> > > > > > > > >  #include 
> > > > > > > > >  #include 
> > > > > > > > > +#include 
> > > > > > > > >
> > > > > > > > >  #include "remoteproc_internal.h"
> > > > > > > > >
> > > > > > > > > @@ -102,6 +103,12 @@ static const struct mem_bank_data 
> > > > > > > > > zynqmp_tcm_banks_lockstep[] = {
> > > > > > > > >   * @rproc: rproc handle
> > > > > > > > >   * @pm_domain_id: RPU CPU power domain id
> > > > > > > > >   * @ipi: pointer to mailbox information
> > > > > > > > > + * @num_pm_dev: number of tcm pm domain devices for this core
> > > > > > > > > + * @pm_dev_core0: pm domain virtual devices for power domain 
> > > > > > > > > framework
> > > > > > > > > + * @pm_dev_core0_link: pm domain device links after 
> > > > > > > > > registration
> > > > > > > > > + * @pm_dev_core1: used only in lockstep mode. second core's 
> > > > > > > > > pm domain virtual devices
> > > > > > > > > + * @pm_dev_core1_link: used only in lockstep mode. second 
> > > > > > > > > core's pm device links after
> > > > > > > > > + * registration
> > > > > > > > >   */
> > > > > > > > >  struct zynqmp_r5_core {
> > > > > > > > > struct device *dev;
> > > > > > > > > @@ -111,6 +118,11 @@ struct zynqmp_r5_core {
> > > > > > > > > struct rproc *rproc;
> > > > > > > > > u32 pm_domain_id;
> > > > > > > > > struct mbox_info *ipi;
> > > > > > > > > +   int num_pm_dev;
> > > > > > > > > +   struct device **pm_dev_core0;
> > > > > > > > > +   struct device_link **pm_dev_core0_link;
> > > > > > > > > +   struct device **pm_dev_core1;
> > > > > > > > > +   struct device_link **pm_dev_core1_link;
> > > > > > > > >  };
> > > > > > > > >
> > > > > > > > >  /**
> > > > > > > > > @@ -651,7 +663,8 @@ static int 
> > > > > > > > > add_tcm_carveout_lockstep_mode(struct rproc *rproc)
> > > > > > > > >  
> > > > > > > > > ZYNQMP_PM_CAPABILITY_ACCESS, 0,
> > > > > > > > >  
> > > > > > > > > ZYNQMP_PM_REQUEST_ACK_BLOCKING);
> > > > > > > > > if (ret < 0) {
> > > > > > > > > -   dev_err(dev, "failed to turn on TCM 
> > > > > > > > > 0x%x", pm_domain_id);
> > > > > > > > > +   dev_err(dev, "failed to turn on TCM 0x%x",
> > > > > > > > > +   pm_domain_id);
> > > > > > > >
> > > > > > > > Spurious change, you should have caught that.
> > > > > > >
> > > > > > > Ack, need to observe changes more closely before sending them.
> > > > > > >
> > > > > > > >
> > > > > > > > > goto release_tcm_lockstep;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > @@ -758,6 +771,189 @@ static int zynqmp_r5_parse_fw(struct 
> > > > > > > > > rproc *rproc, const struct firmware *fw)
> > > > > > > > > return ret;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +static void zynqmp_r5_remove_pm_domains(struct rproc *rproc)
> > > > > > > > > +{
> > > > > > > > > +   struct zynqmp_r5_core *r5_core =

Re: [PATCH v2 13/33] kmsan: Introduce memset_no_sanitize_memory()

2023-12-08 Thread Alexander Potapenko

> A problem with __memset() is that, at least for me, it always ends
> up being a call. There is a use case where we need to write only 1
> byte, so I thought that introducing a call there (when compiling
> without KMSAN) would be unacceptable.

Wonder what happens with that use case if we e.g. build with fortify-source.
Calling memset() for a single byte might be indicating the code is not hot.

> > ...
> >
> > > +__no_sanitize_memory
> > > +static inline void *memset_no_sanitize_memory(void *s, int c,
> > > size_t n)
> > > +{
> > > +   return memset(s, c, n);
> > > +}
> >
> > I think depending on the compiler optimizations this might end up
> > being a call to normal memset, that would still change the shadow
> > bytes.
>
> Interesting, do you have some specific scenario in mind? I vaguely
> remember that in the past there were cases when sanitizer annotations
> were lost after inlining, but I thought they were sorted out?

Sanitizer annotations are indeed lost after inlining, and we cannot do
much about that.
They are implemented using function attributes, and if a function
dissolves after inlining, we cannot possibly know which instructions
belonged to it.

Consider the following example (also available at
https://godbolt.org/z/5r7817G8e):

==
void *kmalloc(int size);

__attribute__((no_sanitize("kernel-memory")))
__attribute__((always_inline))
static void *memset_nosanitize(void *s, int c, int n) {
  return __builtin_memset(s, c, n);
}

void *do_something_nosanitize(int size) {
  void *ptr = kmalloc(size);
  memset_nosanitize(ptr, 0, size);
  return ptr;
}

void *do_something_sanitize(int size) {
  void *ptr = kmalloc(size);
  __builtin_memset(ptr, 0, size);
  return ptr;
}
==

If memset_nosanitize() has __attribute__((always_inline)), the
compiler generates the same LLVM IR calling __msan_memset() for both
do_something_nosanitize() and do_something_sanitize().
If we comment out this attribute, do_something_nosanitize() calls
memset_nosanitize(), which doesn't have the sanitize_memory attribute.

But even now __builtin_memset() is still calling __msan_memset(),
because __attribute__((no_sanitize("kernel-memory"))) somewhat
counterintuitively still preserves some instrumentation (see
include/linux/compiler-clang.h for details).
Replacing __attribute__((no_sanitize("kernel-memory"))) with
__attribute__((disable_sanitizer_instrumentation)) fixes this
situation:

define internal fastcc noundef ptr @memset_nosanitize(void*, int,
int)(ptr noundef returned writeonly %s, i32 noundef %n) unnamed_addr
#2 {
entry:
%conv = sext i32 %n to i64
tail call void @llvm.memset.p0.i64(ptr align 1 %s, i8 0, i64 %conv, i1 false)
ret ptr %s
}

>
> And, in any case, if this were to happen, would not it be considered a
> compiler bug that needs fixing there, and not in the kernel?

As stated above, I don't think this is more or less working as intended.
If we really want the ability to inline __memset(), we could transform
it into memset() in non-sanitizer builds, but perhaps having a call is
also acceptable?

Re: [PATCH v5 4/5] x86/paravirt: switch mixed paravirt/alternative calls to alternative_2

2023-12-08 Thread Juergen Gross


On 08.12.23 13:57, Borislav Petkov wrote:

On Fri, Dec 08, 2023 at 12:53:47PM +0100, Juergen Gross wrote:

Took me a while to find it. Patch 5 was repairing the issue again


Can't have that. Any patch in any set must build and boot successfully
for bisection reasons.



Of course.

The problem will be fixed in V6.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature

[PATCH v3 11/11] arm64: dts: qcom: qcm6490-fairphone-fp5: Enable WiFi

2023-12-08 Thread Luca Weiss

Now that the WPSS remoteproc is enabled, enable wifi so we can use it.

Reviewed-by: Konrad Dybcio 
Signed-off-by: Luca Weiss 
---
Depends on (just to resolve merge conflicts, could also rebase without
that):
https://lore.kernel.org/linux-arm-msm/20231201-sc7280-venus-pas-v3-3-bc132dc5f...@fairphone.com/
---
 arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts 
b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
index 830dee4f2a37..03ec0115b424 100644
--- a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
+++ b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
@@ -914,3 +914,8 @@  {
firmware-name = "qcom/qcm6490/fairphone5/venus.mbn";
status = "okay";
 };
+
+ {
+   qcom,ath11k-calibration-variant = "Fairphone_5";
+   status = "okay";
+};

-- 
2.43.0

[PATCH v3 10/11] arm64: dts: qcom: qcm6490-fairphone-fp5: Enable various remoteprocs

2023-12-08 Thread Luca Weiss

Enable the ADSP, CDSP, MPSS and WPSS that are found on the SoC.

Reviewed-by: Konrad Dybcio 
Signed-off-by: Luca Weiss 
---
 arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts 
b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
index 9c25e28a62d9..830dee4f2a37 100644
--- a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
+++ b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
@@ -691,6 +691,26 @@ _id_1 {
status = "okay";
 };
 
+_adsp {
+   firmware-name = "qcom/qcm6490/fairphone5/adsp.mbn";
+   status = "okay";
+};
+
+_cdsp {
+   firmware-name = "qcom/qcm6490/fairphone5/cdsp.mbn";
+   status = "okay";
+};
+
+_mpss {
+   firmware-name = "qcom/qcm6490/fairphone5/modem.mbn";
+   status = "okay";
+};
+
+_wpss {
+   firmware-name = "qcom/qcm6490/fairphone5/wpss.mbn";
+   status = "okay";
+};
+
 _clk {
drive-strength = <16>;
bias-disable;

-- 
2.43.0

[PATCH v3 08/11] arm64: dts: qcom: sc7280: Add ADSP node

2023-12-08 Thread Luca Weiss

Add the node for the ADSP found on the SC7280 SoC, using standard
Qualcomm firmware.

Acked-by: Konrad Dybcio 
Signed-off-by: Luca Weiss 
---
 arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts |  5 --
 arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi |  5 --
 arch/arm64/boot/dts/qcom/sc7280.dtsi   | 74 ++
 3 files changed, 74 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts 
b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
index 10f4c75aed3f..b1ea31720d7b 100644
--- a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
+++ b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
@@ -77,11 +77,6 @@ cont_splash_mem: cont-splash@e100 {
no-map;
};
 
-   adsp_mem: adsp@8670 {
-   reg = <0x0 0x8670 0x0 0x280>;
-   no-map;
-   };
-
cdsp_mem: cdsp@88f0 {
reg = <0x0 0x88f0 0x0 0x1e0>;
no-map;
diff --git a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi 
b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
index 8f7682fe254a..a60fb58d1bf1 100644
--- a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
@@ -26,11 +26,6 @@
 
 / {
reserved-memory {
-   adsp_mem: memory@8670 {
-   reg = <0x0 0x8670 0x0 0x280>;
-   no-map;
-   };
-
camera_mem: memory@8ad0 {
reg = <0x0 0x8ad0 0x0 0x50>;
no-map;
diff --git a/arch/arm64/boot/dts/qcom/sc7280.dtsi 
b/arch/arm64/boot/dts/qcom/sc7280.dtsi
index f404276361fa..6d319c8c6acf 100644
--- a/arch/arm64/boot/dts/qcom/sc7280.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280.dtsi
@@ -143,6 +143,11 @@ wlan_fw_mem: wlan-fw@80c0 {
no-map;
};
 
+   adsp_mem: adsp@8670 {
+   reg = <0x0 0x8670 0x0 0x280>;
+   no-map;
+   };
+
video_mem: video@8b20 {
reg = <0x0 0x8b20 0x0 0x50>;
no-map;
@@ -3600,6 +3605,75 @@ qspi: spi@88dc000 {
status = "disabled";
};
 
+   remoteproc_adsp: remoteproc@370 {
+   compatible = "qcom,sc7280-adsp-pas";
+   reg = <0 0x0370 0 0x100>;
+
+   interrupts-extended = < 6 IRQ_TYPE_LEVEL_HIGH>,
+ <_smp2p_in 0 
IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 1 
IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 2 
IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 3 
IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 7 
IRQ_TYPE_EDGE_RISING>;
+   interrupt-names = "wdog", "fatal", "ready", "handover",
+ "stop-ack", "shutdown-ack";
+
+   clocks = < RPMH_CXO_CLK>;
+   clock-names = "xo";
+
+   power-domains = < SC7280_LCX>,
+   < SC7280_LMX>;
+   power-domain-names = "lcx", "lmx";
+
+   memory-region = <_mem>;
+
+   qcom,qmp = <_qmp>;
+
+   qcom,smem-states = <_smp2p_out 0>;
+   qcom,smem-state-names = "stop";
+
+   status = "disabled";
+
+   glink-edge {
+   interrupts-extended = < IPCC_CLIENT_LPASS
+
IPCC_MPROC_SIGNAL_GLINK_QMP
+
IRQ_TYPE_EDGE_RISING>;
+
+   mboxes = < IPCC_CLIENT_LPASS
+   IPCC_MPROC_SIGNAL_GLINK_QMP>;
+
+   label = "lpass";
+   qcom,remote-pid = <2>;
+
+   fastrpc {
+   compatible = "qcom,fastrpc";
+   qcom,glink-channels = 
"fastrpcglink-apps-dsp";
+   label = "adsp";
+   qcom,non-secure-domain;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   compute-cb@3 {
+   compatible = 
"qcom,fastrpc-compute-cb";
+   reg = <3>;
+   iommus = <_smmu

[PATCH v3 09/11] arm64: dts: qcom: sc7280: Add CDSP node

2023-12-08 Thread Luca Weiss

Add the node for the ADSP found on the SC7280 SoC, using standard
Qualcomm firmware.

Remove the reserved-memory node from sc7280-chrome-common since CDSP is
currently not used there.

Acked-by: Konrad Dybcio 
Signed-off-by: Luca Weiss 
---
 arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts |   5 -
 arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi |   6 +
 arch/arm64/boot/dts/qcom/sc7280.dtsi   | 143 +
 3 files changed, 149 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts 
b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
index b1ea31720d7b..9c25e28a62d9 100644
--- a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
+++ b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
@@ -77,11 +77,6 @@ cont_splash_mem: cont-splash@e100 {
no-map;
};
 
-   cdsp_mem: cdsp@88f0 {
-   reg = <0x0 0x88f0 0x0 0x1e0>;
-   no-map;
-   };
-
rmtfs_mem: memory@f850 {
compatible = "qcom,rmtfs-mem";
reg = <0x0 0xf850 0x0 0x60>;
diff --git a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi 
b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
index a60fb58d1bf1..c53a5f32915a 100644
--- a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
@@ -17,6 +17,7 @@
  * required by the setup for Chrome boards.
  */
 
+/delete-node/ _mem;
 /delete-node/ _zap_mem;
 /delete-node/ _zap_shader;
 /delete-node/ _mem;
@@ -86,6 +87,11 @@ spi_flash: flash@0 {
};
 };
 
+/* Currently not used */
+_cdsp {
+   /delete-property/ memory-region;
+};
+
 _wpss {
compatible = "qcom,sc7280-wpss-pil";
clocks = < GCC_WPSS_AHB_BDG_MST_CLK>,
diff --git a/arch/arm64/boot/dts/qcom/sc7280.dtsi 
b/arch/arm64/boot/dts/qcom/sc7280.dtsi
index 6d319c8c6acf..0c25b4ecd0dc 100644
--- a/arch/arm64/boot/dts/qcom/sc7280.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280.dtsi
@@ -148,6 +148,11 @@ adsp_mem: adsp@8670 {
no-map;
};
 
+   cdsp_mem: cdsp@88f0 {
+   reg = <0x0 0x88f0 0x0 0x1e0>;
+   no-map;
+   };
+
video_mem: video@8b20 {
reg = <0x0 0x8b20 0x0 0x50>;
no-map;
@@ -3842,6 +3847,144 @@ nsp_noc: interconnect@a0c {
qcom,bcm-voters = <_bcm_voter>;
};
 
+   remoteproc_cdsp: remoteproc@a30 {
+   compatible = "qcom,sc7280-cdsp-pas";
+   reg = <0 0x0a30 0 0x1>;
+
+   interrupts-extended = < GIC_SPI 578 
IRQ_TYPE_LEVEL_HIGH>,
+ <_smp2p_in 0 
IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 1 
IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 2 
IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 3 
IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 7 
IRQ_TYPE_EDGE_RISING>;
+   interrupt-names = "wdog", "fatal", "ready", "handover",
+ "stop-ack", "shutdown-ack";
+
+   clocks = < RPMH_CXO_CLK>;
+   clock-names = "xo";
+
+   power-domains = < SC7280_CX>,
+   < SC7280_MX>;
+   power-domain-names = "cx", "mx";
+
+   interconnects = <_noc MASTER_CDSP_PROC 0 _virt 
SLAVE_EBI1 0>;
+
+   memory-region = <_mem>;
+
+   qcom,qmp = <_qmp>;
+
+   qcom,smem-states = <_smp2p_out 0>;
+   qcom,smem-state-names = "stop";
+
+   status = "disabled";
+
+   glink-edge {
+   interrupts-extended = < IPCC_CLIENT_CDSP
+
IPCC_MPROC_SIGNAL_GLINK_QMP
+
IRQ_TYPE_EDGE_RISING>;
+   mboxes = < IPCC_CLIENT_CDSP
+   IPCC_MPROC_SIGNAL_GLINK_QMP>;
+
+   label = "cdsp";
+   qcom,remote-pid = <5>;
+
+   fastrpc {
+   compatible = "qcom,fastrpc";
+   qcom,glink-channels = 
"fastrpcglink-apps-dsp";
+   label = "cdsp";
+   qcom,non-secure-domain;
+   #address-cells = <1>;
+

[PATCH v3 07/11] arm64: dts: qcom: sc7280: Use WPSS PAS instead of PIL

2023-12-08 Thread Luca Weiss

The wpss-pil driver wants to manage too many resources that cannot be
touched with standard Qualcomm firmware.

Use the compatible from the PAS driver and move the ChromeOS-specific
bits to sc7280-chrome-common.dtsi.

Signed-off-by: Luca Weiss 
---
 arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi | 19 ++-
 arch/arm64/boot/dts/qcom/sc7280.dtsi   | 15 +++
 2 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi 
b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
index fd3ff576d1fc..8f7682fe254a 100644
--- a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
@@ -92,8 +92,25 @@ spi_flash: flash@0 {
 };
 
 _wpss {
-   status = "okay";
+   compatible = "qcom,sc7280-wpss-pil";
+   clocks = < GCC_WPSS_AHB_BDG_MST_CLK>,
+< GCC_WPSS_AHB_CLK>,
+< GCC_WPSS_RSCP_CLK>,
+< RPMH_CXO_CLK>;
+   clock-names = "ahb_bdg",
+ "ahb",
+ "rscp",
+ "xo";
+
+   resets = <_reset AOSS_CC_WCSS_RESTART>,
+<_reset PDC_WPSS_SYNC_RESET>;
+   reset-names = "restart", "pdc_sync";
+
+   qcom,halt-regs = <_1 0x17000>;
+
firmware-name = "ath11k/WCN6750/hw1.0/wpss.mdt";
+
+   status = "okay";
 };
 
  {
diff --git a/arch/arm64/boot/dts/qcom/sc7280.dtsi 
b/arch/arm64/boot/dts/qcom/sc7280.dtsi
index 221ab163c8ad..f404276361fa 100644
--- a/arch/arm64/boot/dts/qcom/sc7280.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280.dtsi
@@ -3601,7 +3601,7 @@ qspi: spi@88dc000 {
};
 
remoteproc_wpss: remoteproc@8a0 {
-   compatible = "qcom,sc7280-wpss-pil";
+   compatible = "qcom,sc7280-wpss-pas";
reg = <0 0x08a0 0 0x1>;
 
interrupts-extended = < GIC_SPI 587 
IRQ_TYPE_EDGE_RISING>,
@@ -3613,12 +3613,8 @@ remoteproc_wpss: remoteproc@8a0 {
interrupt-names = "wdog", "fatal", "ready", "handover",
  "stop-ack", "shutdown-ack";
 
-   clocks = < GCC_WPSS_AHB_BDG_MST_CLK>,
-< GCC_WPSS_AHB_CLK>,
-< GCC_WPSS_RSCP_CLK>,
-< RPMH_CXO_CLK>;
-   clock-names = "ahb_bdg", "ahb",
- "rscp", "xo";
+   clocks = < RPMH_CXO_CLK>;
+   clock-names = "xo";
 
power-domains = < SC7280_CX>,
< SC7280_MX>;
@@ -3631,11 +3627,6 @@ remoteproc_wpss: remoteproc@8a0 {
qcom,smem-states = <_smp2p_out 0>;
qcom,smem-state-names = "stop";
 
-   resets = <_reset AOSS_CC_WCSS_RESTART>,
-<_reset PDC_WPSS_SYNC_RESET>;
-   reset-names = "restart", "pdc_sync";
-
-   qcom,halt-regs = <_1 0x17000>;
 
status = "disabled";
 

-- 
2.43.0

[PATCH v3 03/11] arm64: dts: qcom: sc7280: Rename reserved-memory nodes

2023-12-08 Thread Luca Weiss

It was clarified a while ago that reserved-memory nodes shouldn't be
called memory@ but should have a descriptive name. Update sc7280.dtsi to
follow that.

Reviewed-by: Konrad Dybcio 
Signed-off-by: Luca Weiss 
---
 arch/arm64/boot/dts/qcom/sc7280.dtsi | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/sc7280.dtsi 
b/arch/arm64/boot/dts/qcom/sc7280.dtsi
index 1b40e18ff152..f4d02d9dcb55 100644
--- a/arch/arm64/boot/dts/qcom/sc7280.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280.dtsi
@@ -92,63 +92,63 @@ reserved-memory {
#size-cells = <2>;
ranges;
 
-   wlan_ce_mem: memory@4cd000 {
+   wlan_ce_mem: wlan-ce@4cd000 {
no-map;
reg = <0x0 0x004cd000 0x0 0x1000>;
};
 
-   hyp_mem: memory@8000 {
+   hyp_mem: hyp@8000 {
reg = <0x0 0x8000 0x0 0x60>;
no-map;
};
 
-   xbl_mem: memory@8060 {
+   xbl_mem: xbl@8060 {
reg = <0x0 0x8060 0x0 0x20>;
no-map;
};
 
-   aop_mem: memory@8080 {
+   aop_mem: aop@8080 {
reg = <0x0 0x8080 0x0 0x6>;
no-map;
};
 
-   aop_cmd_db_mem: memory@8086 {
+   aop_cmd_db_mem: aop-cmd-db@8086 {
reg = <0x0 0x8086 0x0 0x2>;
compatible = "qcom,cmd-db";
no-map;
};
 
-   reserved_xbl_uefi_log: memory@8088 {
+   reserved_xbl_uefi_log: xbl-uefi-res@8088 {
reg = <0x0 0x80884000 0x0 0x1>;
no-map;
};
 
-   sec_apps_mem: memory@808ff000 {
+   sec_apps_mem: sec-apps@808ff000 {
reg = <0x0 0x808ff000 0x0 0x1000>;
no-map;
};
 
-   smem_mem: memory@8090 {
+   smem_mem: smem@8090 {
reg = <0x0 0x8090 0x0 0x20>;
no-map;
};
 
-   cpucp_mem: memory@80b0 {
+   cpucp_mem: cpucp@80b0 {
no-map;
reg = <0x0 0x80b0 0x0 0x10>;
};
 
-   wlan_fw_mem: memory@80c0 {
+   wlan_fw_mem: wlan-fw@80c0 {
reg = <0x0 0x80c0 0x0 0xc0>;
no-map;
};
 
-   video_mem: memory@8b20 {
+   video_mem: video@8b20 {
reg = <0x0 0x8b20 0x0 0x50>;
no-map;
};
 
-   ipa_fw_mem: memory@8b70 {
+   ipa_fw_mem: ipa-fw@8b70 {
reg = <0 0x8b70 0 0x1>;
no-map;
};
@@ -158,7 +158,7 @@ gpu_zap_mem: zap@8b71a000 {
no-map;
};
 
-   rmtfs_mem: memory@9c90 {
+   rmtfs_mem: rmtfs@9c90 {
compatible = "qcom,rmtfs-mem";
reg = <0x0 0x9c90 0x0 0x28>;
no-map;

-- 
2.43.0

[PATCH v3 06/11] remoteproc: qcom_q6v5_pas: Add SC7280 ADSP, CDSP & WPSS

2023-12-08 Thread Luca Weiss

Add support for the ADSP, CDSP and WPSS remoteprocs found on the SC7280
SoC using the q6v5-pas driver.

This driver can be used on regular LA ("Linux Android") based releases,
however the SC7280 ChromeOS devices need different driver support due to
firmware differences.

Reviewed-by: Konrad Dybcio 
Signed-off-by: Luca Weiss 
---
 drivers/remoteproc/qcom_q6v5_pas.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/remoteproc/qcom_q6v5_pas.c 
b/drivers/remoteproc/qcom_q6v5_pas.c
index 913a5d2068e8..a9dd58608052 100644
--- a/drivers/remoteproc/qcom_q6v5_pas.c
+++ b/drivers/remoteproc/qcom_q6v5_pas.c
@@ -1165,6 +1165,22 @@ static const struct adsp_data sm8550_mpss_resource = {
.region_assign_idx = 2,
 };
 
+static const struct adsp_data sc7280_wpss_resource = {
+   .crash_reason_smem = 626,
+   .firmware_name = "wpss.mdt",
+   .pas_id = 6,
+   .auto_boot = true,
+   .proxy_pd_names = (char*[]){
+   "cx",
+   "mx",
+   NULL
+   },
+   .load_state = "wpss",
+   .ssr_name = "wpss",
+   .sysmon_name = "wpss",
+   .ssctl_id = 0x19,
+};
+
 static const struct of_device_id adsp_of_match[] = {
{ .compatible = "qcom,msm8226-adsp-pil", .data = _resource_init},
{ .compatible = "qcom,msm8953-adsp-pil", .data = 
_adsp_resource},
@@ -1178,7 +1194,10 @@ static const struct of_device_id adsp_of_match[] = {
{ .compatible = "qcom,qcs404-wcss-pas", .data = _resource_init },
{ .compatible = "qcom,sc7180-adsp-pas", .data = _adsp_resource},
{ .compatible = "qcom,sc7180-mpss-pas", .data = _resource_init},
+   { .compatible = "qcom,sc7280-adsp-pas", .data = _adsp_resource},
+   { .compatible = "qcom,sc7280-cdsp-pas", .data = _cdsp_resource},
{ .compatible = "qcom,sc7280-mpss-pas", .data = _resource_init},
+   { .compatible = "qcom,sc7280-wpss-pas", .data = _wpss_resource},
{ .compatible = "qcom,sc8180x-adsp-pas", .data = _adsp_resource},
{ .compatible = "qcom,sc8180x-cdsp-pas", .data = _cdsp_resource},
{ .compatible = "qcom,sc8180x-mpss-pas", .data = 
_mpss_resource},

-- 
2.43.0

[PATCH v3 05/11] dt-bindings: remoteproc: qcom: sc7180-pas: Add SC7280 compatibles

2023-12-08 Thread Luca Weiss

Add the compatibles and constraints for the ADSP, CDSP and WPSS found on
the SC7280 SoC.

Reviewed-by: Krzysztof Kozlowski 
Signed-off-by: Luca Weiss 
---
 .../bindings/remoteproc/qcom,sc7180-pas.yaml| 21 +
 1 file changed, 21 insertions(+)

diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,sc7180-pas.yaml 
b/Documentation/devicetree/bindings/remoteproc/qcom,sc7180-pas.yaml
index 6f0bd6fa5d26..c054b84fdcd5 100644
--- a/Documentation/devicetree/bindings/remoteproc/qcom,sc7180-pas.yaml
+++ b/Documentation/devicetree/bindings/remoteproc/qcom,sc7180-pas.yaml
@@ -18,7 +18,10 @@ properties:
 enum:
   - qcom,sc7180-adsp-pas
   - qcom,sc7180-mpss-pas
+  - qcom,sc7280-adsp-pas
+  - qcom,sc7280-cdsp-pas
   - qcom,sc7280-mpss-pas
+  - qcom,sc7280-wpss-pas
 
   reg:
 maxItems: 1
@@ -75,6 +78,7 @@ allOf:
 compatible:
   enum:
 - qcom,sc7180-adsp-pas
+- qcom,sc7280-adsp-pas
 then:
   properties:
 power-domains:
@@ -120,6 +124,23 @@ allOf:
 - const: cx
 - const: mss
 
+  - if:
+  properties:
+compatible:
+  enum:
+- qcom,sc7280-cdsp-pas
+- qcom,sc7280-wpss-pas
+then:
+  properties:
+power-domains:
+  items:
+- description: CX power domain
+- description: MX power domain
+power-domain-names:
+  items:
+- const: cx
+- const: mx
+
 unevaluatedProperties: false
 
 examples:

-- 
2.43.0

[PATCH v3 04/11] arm64: dts: qcom: sc7280*: move MPSS and WPSS memory to dtsi

2023-12-08 Thread Luca Weiss

It appears that all SC7280-based devices so far have mpss_mem and
wpss_mem on the same reg with the same size.

Also these memory regions are referenced already in sc7280.dtsi so
that's where they should also be defined.

Signed-off-by: Luca Weiss 
---
 arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts  | 10 --
 arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi  |  5 -
 arch/arm64/boot/dts/qcom/sc7280-herobrine-lte-sku.dtsi  |  5 -
 arch/arm64/boot/dts/qcom/sc7280-herobrine-wifi-sku.dtsi |  1 +
 arch/arm64/boot/dts/qcom/sc7280.dtsi| 10 ++
 5 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts 
b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
index de49bb11f3c7..10f4c75aed3f 100644
--- a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
+++ b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
@@ -87,16 +87,6 @@ cdsp_mem: cdsp@88f0 {
no-map;
};
 
-   mpss_mem: mpss@8b80 {
-   reg = <0x0 0x8b80 0x0 0xf60>;
-   no-map;
-   };
-
-   wpss_mem: wpss@9ae0 {
-   reg = <0x0 0x9ae0 0x0 0x190>;
-   no-map;
-   };
-
rmtfs_mem: memory@f850 {
compatible = "qcom,rmtfs-mem";
reg = <0x0 0xf850 0x0 0x60>;
diff --git a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi 
b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
index 186aa82ce662..fd3ff576d1fc 100644
--- a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
@@ -40,11 +40,6 @@ venus_mem: memory@8b20 {
reg = <0x0 0x8b20 0x0 0x50>;
no-map;
};
-
-   wpss_mem: memory@9ae0 {
-   reg = <0x0 0x9ae0 0x0 0x190>;
-   no-map;
-   };
};
 };
 
diff --git a/arch/arm64/boot/dts/qcom/sc7280-herobrine-lte-sku.dtsi 
b/arch/arm64/boot/dts/qcom/sc7280-herobrine-lte-sku.dtsi
index 203274c10532..b721a8546800 100644
--- a/arch/arm64/boot/dts/qcom/sc7280-herobrine-lte-sku.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280-herobrine-lte-sku.dtsi
@@ -8,11 +8,6 @@
 
 / {
reserved-memory {
-   mpss_mem: memory@8b80 {
-   reg = <0x0 0x8b80 0x0 0xf60>;
-   no-map;
-   };
-
mba_mem: memory@9c70 {
reg = <0x0 0x9c70 0x0 0x20>;
no-map;
diff --git a/arch/arm64/boot/dts/qcom/sc7280-herobrine-wifi-sku.dtsi 
b/arch/arm64/boot/dts/qcom/sc7280-herobrine-wifi-sku.dtsi
index 2febd6126d4c..3ebc915f0dc2 100644
--- a/arch/arm64/boot/dts/qcom/sc7280-herobrine-wifi-sku.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280-herobrine-wifi-sku.dtsi
@@ -7,5 +7,6 @@
 
 /* WIFI SKUs save 256M by not having modem/mba/rmtfs memory regions defined. */
 
+/delete-node/ _mem;
 /delete-node/ _mpss;
 /delete-node/ _mem;
diff --git a/arch/arm64/boot/dts/qcom/sc7280.dtsi 
b/arch/arm64/boot/dts/qcom/sc7280.dtsi
index f4d02d9dcb55..221ab163c8ad 100644
--- a/arch/arm64/boot/dts/qcom/sc7280.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280.dtsi
@@ -158,6 +158,16 @@ gpu_zap_mem: zap@8b71a000 {
no-map;
};
 
+   mpss_mem: mpss@8b80 {
+   reg = <0x0 0x8b80 0x0 0xf60>;
+   no-map;
+   };
+
+   wpss_mem: wpss@9ae0 {
+   reg = <0x0 0x9ae0 0x0 0x190>;
+   no-map;
+   };
+
rmtfs_mem: rmtfs@9c90 {
compatible = "qcom,rmtfs-mem";
reg = <0x0 0x9c90 0x0 0x28>;

-- 
2.43.0

[PATCH v3 02/11] arm64: dts: qcom: sc7280: Remove unused second MPSS reg

2023-12-08 Thread Luca Weiss

The bindings for sc7280-mpss-pas neither expects a second reg nor a
reg-names property, which is only required by the sc7280-mss-pil
bindings.

Move it to sc7280-herobrine-lte-sku.dtsi, the only place where that
other compatible is used.

Reviewed-by: Krzysztof Kozlowski 
Signed-off-by: Luca Weiss 
---
 arch/arm64/boot/dts/qcom/sc7280-herobrine-lte-sku.dtsi | 2 ++
 arch/arm64/boot/dts/qcom/sc7280.dtsi   | 3 +--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/sc7280-herobrine-lte-sku.dtsi 
b/arch/arm64/boot/dts/qcom/sc7280-herobrine-lte-sku.dtsi
index 95505549adcc..203274c10532 100644
--- a/arch/arm64/boot/dts/qcom/sc7280-herobrine-lte-sku.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280-herobrine-lte-sku.dtsi
@@ -33,6 +33,8 @@  {
 
 _mpss {
compatible = "qcom,sc7280-mss-pil";
+   reg = <0 0x0408 0 0x1>, <0 0x0418 0 0x48>;
+   reg-names = "qdsp6", "rmb";
 
clocks = < GCC_MSS_CFG_AHB_CLK>,
 < GCC_MSS_OFFLINE_AXI_CLK>,
diff --git a/arch/arm64/boot/dts/qcom/sc7280.dtsi 
b/arch/arm64/boot/dts/qcom/sc7280.dtsi
index 35208248f8cd..1b40e18ff152 100644
--- a/arch/arm64/boot/dts/qcom/sc7280.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280.dtsi
@@ -2860,8 +2860,7 @@ adreno_smmu: iommu@3da {
 
remoteproc_mpss: remoteproc@408 {
compatible = "qcom,sc7280-mpss-pas";
-   reg = <0 0x0408 0 0x1>, <0 0x0418 0 0x48>;
-   reg-names = "qdsp6", "rmb";
+   reg = <0 0x0408 0 0x1>;
 
interrupts-extended = < GIC_SPI 264 
IRQ_TYPE_LEVEL_HIGH>,
  <_smp2p_in 0 
IRQ_TYPE_EDGE_RISING>,

-- 
2.43.0

[PATCH v3 01/11] dt-bindings: remoteproc: qcom: sc7180-pas: Fix SC7280 MPSS PD-names

2023-12-08 Thread Luca Weiss

The power domains for MPSS on SC7280 are actually named CX and MSS, and
not CX and MX. Adjust the name which also aligns the bindings with the
dts and fixes validation.

Fixes: 8bb92d6fd0b3 ("dt-bindings: remoteproc: qcom,sc7180-pas: split into 
separate file")
Acked-by: Krzysztof Kozlowski 
Signed-off-by: Luca Weiss 
---
 Documentation/devicetree/bindings/remoteproc/qcom,sc7180-pas.yaml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,sc7180-pas.yaml 
b/Documentation/devicetree/bindings/remoteproc/qcom,sc7180-pas.yaml
index f10f329677d8..6f0bd6fa5d26 100644
--- a/Documentation/devicetree/bindings/remoteproc/qcom,sc7180-pas.yaml
+++ b/Documentation/devicetree/bindings/remoteproc/qcom,sc7180-pas.yaml
@@ -114,11 +114,11 @@ allOf:
 power-domains:
   items:
 - description: CX power domain
-- description: MX power domain
+- description: MSS power domain
 power-domain-names:
   items:
 - const: cx
-- const: mx
+- const: mss
 
 unevaluatedProperties: false
 

-- 
2.43.0

[PATCH v3 00/11] Remoteprocs (ADSP, CDSP, WPSS) for SC7280

2023-12-08 Thread Luca Weiss

This series adds support for the ADSP, CDSP and WPSS remoteprocs found
on SC7280. And finally enable them and WiFi on the QCM6490-based
Fairphone 5 smartphone.

The first two patches are fixes for the MPSS to fix some dt validation
issues. They're included in this series to avoid conflicts with the
later patches and keep it simpler.

Then there's two patches reorganizing the reserved-memory setup for
sc7280 in preparations for the new remoteprocs.

Please note, that the ChromeOS-based devices using SC7280 need different
driver and dts support, similar to how there's already
qcom,sc7280-mpss-pas for "standard" firmware and there's
qcom,sc7280-mss-pil for ChromeOS firmware.

I'm aware of the series also adding SC7280 ADSP support with the last
revision sent in June this year.

https://lore.kernel.org/linux-arm-msm/20230616103534.4031331-1-quic_m...@quicinc.com/

However there's some differences since that series added the "pil"
variant for ChromeOS, not "pas" for standard firmware. Also it seems on
ChromeOS devices gpr+q6apm+q6prm is used. On my device it appears to be
instead apr+q6afe+q6asm+q6adm but I don't add either in this series to
keep it a bit simpler, and I couldn't test much of that yet.

Signed-off-by: Luca Weiss 
---
Changes in v3:
- Rebase on qcom for-next and resolve conflicts
- Pick up tags
- Link to v2: 
https://lore.kernel.org/r/20231113-sc7280-remoteprocs-v2-0-e5c5fd526...@fairphone.com

Changes in v2:
- Add patch renaming memory@ reserved-memory nodes (preparation for
  next)
- Add patch moving mpss_mem and wpss_mem to sc7280.dtsi
- Follow *_mem node being in sc7280.dtsi also for ADSP & CDSP patches
- Use (squashed) .mbn instead of (split) .mdt for FP5
- Set qcom,ath11k-calibration-variant for FP5
- Pick up tags (except for Krzysztof's R-b for ADSP & CDSP since there
  were changes)
- Link to v1: 
https://lore.kernel.org/r/20231027-sc7280-remoteprocs-v1-0-05ce95d93...@fairphone.com

---
Luca Weiss (11):
  dt-bindings: remoteproc: qcom: sc7180-pas: Fix SC7280 MPSS PD-names
  arm64: dts: qcom: sc7280: Remove unused second MPSS reg
  arm64: dts: qcom: sc7280: Rename reserved-memory nodes
  arm64: dts: qcom: sc7280*: move MPSS and WPSS memory to dtsi
  dt-bindings: remoteproc: qcom: sc7180-pas: Add SC7280 compatibles
  remoteproc: qcom_q6v5_pas: Add SC7280 ADSP, CDSP & WPSS
  arm64: dts: qcom: sc7280: Use WPSS PAS instead of PIL
  arm64: dts: qcom: sc7280: Add ADSP node
  arm64: dts: qcom: sc7280: Add CDSP node
  arm64: dts: qcom: qcm6490-fairphone-fp5: Enable various remoteprocs
  arm64: dts: qcom: qcm6490-fairphone-fp5: Enable WiFi

 .../bindings/remoteproc/qcom,sc7180-pas.yaml   |  21 ++
 arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts |  45 ++--
 arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi |  35 ++-
 .../boot/dts/qcom/sc7280-herobrine-lte-sku.dtsi|   7 +-
 .../boot/dts/qcom/sc7280-herobrine-wifi-sku.dtsi   |   1 +
 arch/arm64/boot/dts/qcom/sc7280.dtsi   | 271 +++--
 drivers/remoteproc/qcom_q6v5_pas.c |  19 ++
 7 files changed, 336 insertions(+), 63 deletions(-)
---
base-commit: e7f403a575a315ecf79ee4f411cc76bb60bae2f6
change-id: 20231027-sc7280-remoteprocs-048208cc1e13

Best regards,
-- 
Luca Weiss

[PATCH v4 3/3] remoteproc: qcom: pas: Add SM8650 remoteproc support

2023-12-08 Thread Neil Armstrong

Add DSP Peripheral Authentication Service support for the SM8650 platform.

Reviewed-by: Dmitry Baryshkov 
Signed-off-by: Neil Armstrong 
---
 drivers/remoteproc/qcom_q6v5_pas.c | 50 ++
 1 file changed, 50 insertions(+)

diff --git a/drivers/remoteproc/qcom_q6v5_pas.c 
b/drivers/remoteproc/qcom_q6v5_pas.c
index 46371f1ad32d..01effbd969a5 100644
--- a/drivers/remoteproc/qcom_q6v5_pas.c
+++ b/drivers/remoteproc/qcom_q6v5_pas.c
@@ -1197,6 +1197,53 @@ static const struct adsp_data sm8550_mpss_resource = {
.region_assign_vmid = QCOM_SCM_VMID_MSS_MSA,
 };
 
+static const struct adsp_data sm8650_cdsp_resource = {
+   .crash_reason_smem = 601,
+   .firmware_name = "cdsp.mdt",
+   .dtb_firmware_name = "cdsp_dtb.mdt",
+   .pas_id = 18,
+   .dtb_pas_id = 0x25,
+   .minidump_id = 7,
+   .auto_boot = true,
+   .proxy_pd_names = (char*[]){
+   "cx",
+   "mxc",
+   "nsp",
+   NULL
+   },
+   .load_state = "cdsp",
+   .ssr_name = "cdsp",
+   .sysmon_name = "cdsp",
+   .ssctl_id = 0x17,
+   .region_assign_idx = 2,
+   .region_assign_count = 1,
+   .region_assign_shared = true,
+   .region_assign_vmid = QCOM_SCM_VMID_CDSP,
+};
+
+static const struct adsp_data sm8650_mpss_resource = {
+   .crash_reason_smem = 421,
+   .firmware_name = "modem.mdt",
+   .dtb_firmware_name = "modem_dtb.mdt",
+   .pas_id = 4,
+   .dtb_pas_id = 0x26,
+   .minidump_id = 3,
+   .auto_boot = false,
+   .decrypt_shutdown = true,
+   .proxy_pd_names = (char*[]){
+   "cx",
+   "mss",
+   NULL
+   },
+   .load_state = "modem",
+   .ssr_name = "mpss",
+   .sysmon_name = "modem",
+   .ssctl_id = 0x12,
+   .region_assign_idx = 2,
+   .region_assign_count = 2,
+   .region_assign_vmid = QCOM_SCM_VMID_MSS_MSA,
+};
+
 static const struct of_device_id adsp_of_match[] = {
{ .compatible = "qcom,msm8226-adsp-pil", .data = _resource_init},
{ .compatible = "qcom,msm8953-adsp-pil", .data = 
_adsp_resource},
@@ -1249,6 +1296,9 @@ static const struct of_device_id adsp_of_match[] = {
{ .compatible = "qcom,sm8550-adsp-pas", .data = _adsp_resource},
{ .compatible = "qcom,sm8550-cdsp-pas", .data = _cdsp_resource},
{ .compatible = "qcom,sm8550-mpss-pas", .data = _mpss_resource},
+   { .compatible = "qcom,sm8650-adsp-pas", .data = _adsp_resource},
+   { .compatible = "qcom,sm8650-cdsp-pas", .data = _cdsp_resource},
+   { .compatible = "qcom,sm8650-mpss-pas", .data = _mpss_resource},
{ },
 };
 MODULE_DEVICE_TABLE(of, adsp_of_match);

-- 
2.34.1

[PATCH v4 2/3] remoteproc: qcom: pas: make region assign more generic

2023-12-08 Thread Neil Armstrong

The current memory region assign only supports a single
memory region.

But new platforms introduces more regions to make the
memory requirements more flexible for various use cases.
Those new platforms also shares the memory region between the
DSP and HLOS.

To handle this, make the region assign more generic in order
to support more than a single memory region and also permit
setting the regions permissions as shared.

Reviewed-by: Mukesh Ojha 
Signed-off-by: Neil Armstrong 
---
 drivers/remoteproc/qcom_q6v5_pas.c | 100 -
 1 file changed, 66 insertions(+), 34 deletions(-)

diff --git a/drivers/remoteproc/qcom_q6v5_pas.c 
b/drivers/remoteproc/qcom_q6v5_pas.c
index 913a5d2068e8..46371f1ad32d 100644
--- a/drivers/remoteproc/qcom_q6v5_pas.c
+++ b/drivers/remoteproc/qcom_q6v5_pas.c
@@ -33,6 +33,8 @@
 
 #define ADSP_DECRYPT_SHUTDOWN_DELAY_MS 100
 
+#define MAX_ASSIGN_COUNT 2
+
 struct adsp_data {
int crash_reason_smem;
const char *firmware_name;
@@ -51,6 +53,9 @@ struct adsp_data {
int ssctl_id;
 
int region_assign_idx;
+   int region_assign_count;
+   bool region_assign_shared;
+   int region_assign_vmid;
 };
 
 struct qcom_adsp {
@@ -87,15 +92,18 @@ struct qcom_adsp {
phys_addr_t dtb_mem_phys;
phys_addr_t mem_reloc;
phys_addr_t dtb_mem_reloc;
-   phys_addr_t region_assign_phys;
+   phys_addr_t region_assign_phys[MAX_ASSIGN_COUNT];
void *mem_region;
void *dtb_mem_region;
size_t mem_size;
size_t dtb_mem_size;
-   size_t region_assign_size;
+   size_t region_assign_size[MAX_ASSIGN_COUNT];
 
int region_assign_idx;
-   u64 region_assign_perms;
+   int region_assign_count;
+   bool region_assign_shared;
+   int region_assign_vmid;
+   u64 region_assign_perms[MAX_ASSIGN_COUNT];
 
struct qcom_rproc_glink glink_subdev;
struct qcom_rproc_subdev smd_subdev;
@@ -590,37 +598,53 @@ static int adsp_alloc_memory_region(struct qcom_adsp 
*adsp)
 
 static int adsp_assign_memory_region(struct qcom_adsp *adsp)
 {
-   struct reserved_mem *rmem = NULL;
-   struct qcom_scm_vmperm perm;
+   struct qcom_scm_vmperm perm[MAX_ASSIGN_COUNT];
struct device_node *node;
+   unsigned int perm_size;
+   int offset;
int ret;
 
if (!adsp->region_assign_idx)
return 0;
 
-   node = of_parse_phandle(adsp->dev->of_node, "memory-region", 
adsp->region_assign_idx);
-   if (node)
-   rmem = of_reserved_mem_lookup(node);
-   of_node_put(node);
-   if (!rmem) {
-   dev_err(adsp->dev, "unable to resolve shareable 
memory-region\n");
-   return -EINVAL;
-   }
+   for (offset = 0; offset < adsp->region_assign_count; ++offset) {
+   struct reserved_mem *rmem = NULL;
+
+   node = of_parse_phandle(adsp->dev->of_node, "memory-region",
+   adsp->region_assign_idx + offset);
+   if (node)
+   rmem = of_reserved_mem_lookup(node);
+   of_node_put(node);
+   if (!rmem) {
+   dev_err(adsp->dev, "unable to resolve shareable 
memory-region index %d\n",
+   offset);
+   return -EINVAL;
+   }
 
-   perm.vmid = QCOM_SCM_VMID_MSS_MSA;
-   perm.perm = QCOM_SCM_PERM_RW;
+   if (adsp->region_assign_shared)  {
+   perm[0].vmid = QCOM_SCM_VMID_HLOS;
+   perm[0].perm = QCOM_SCM_PERM_RW;
+   perm[1].vmid = adsp->region_assign_vmid;
+   perm[1].perm = QCOM_SCM_PERM_RW;
+   perm_size = 2;
+   } else {
+   perm[0].vmid = adsp->region_assign_vmid;
+   perm[0].perm = QCOM_SCM_PERM_RW;
+   perm_size = 1;
+   }
 
-   adsp->region_assign_phys = rmem->base;
-   adsp->region_assign_size = rmem->size;
-   adsp->region_assign_perms = BIT(QCOM_SCM_VMID_HLOS);
+   adsp->region_assign_phys[offset] = rmem->base;
+   adsp->region_assign_size[offset] = rmem->size;
+   adsp->region_assign_perms[offset] = BIT(QCOM_SCM_VMID_HLOS);
 
-   ret = qcom_scm_assign_mem(adsp->region_assign_phys,
- adsp->region_assign_size,
- >region_assign_perms,
- , 1);
-   if (ret < 0) {
-   dev_err(adsp->dev, "assign memory failed\n");
-   return ret;
+   ret = qcom_scm_assign_mem(adsp->region_assign_phys[offset],
+ adsp->region_assign_size[offset],
+ >region_assign_perms[offset],
+ perm, perm_size);
+

[PATCH v4 1/3] dt-bindings: remoteproc: qcom,sm8550-pas: document the SM8650 PAS

2023-12-08 Thread Neil Armstrong

Document the DSP Peripheral Authentication Service on the SM8650 Platform.

Reviewed-by: Krzysztof Kozlowski 
Signed-off-by: Neil Armstrong 
---
 .../bindings/remoteproc/qcom,sm8550-pas.yaml   | 44 +-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml 
b/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml
index 58120829fb06..4e8ce9e7e9fa 100644
--- a/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml
+++ b/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml
@@ -19,6 +19,9 @@ properties:
   - qcom,sm8550-adsp-pas
   - qcom,sm8550-cdsp-pas
   - qcom,sm8550-mpss-pas
+  - qcom,sm8650-adsp-pas
+  - qcom,sm8650-cdsp-pas
+  - qcom,sm8650-mpss-pas
 
   reg:
 maxItems: 1
@@ -49,6 +52,7 @@ properties:
   - description: Memory region for main Firmware authentication
   - description: Memory region for Devicetree Firmware authentication
   - description: DSM Memory region
+  - description: DSM Memory region 2
 
 required:
   - compatible
@@ -63,6 +67,7 @@ allOf:
   enum:
 - qcom,sm8550-adsp-pas
 - qcom,sm8550-cdsp-pas
+- qcom,sm8650-adsp-pas
 then:
   properties:
 interrupts:
@@ -71,7 +76,26 @@ allOf:
   maxItems: 5
 memory-region:
   maxItems: 2
-else:
+  - if:
+  properties:
+compatible:
+  enum:
+- qcom,sm8650-cdsp-pas
+then:
+  properties:
+interrupts:
+  maxItems: 5
+interrupt-names:
+  maxItems: 5
+memory-region:
+  minItems: 3
+  maxItems: 3
+  - if:
+  properties:
+compatible:
+  enum:
+- qcom,sm8550-mpss-pas
+then:
   properties:
 interrupts:
   minItems: 6
@@ -79,12 +103,28 @@ allOf:
   minItems: 6
 memory-region:
   minItems: 3
+  maxItems: 3
+  - if:
+  properties:
+compatible:
+  enum:
+- qcom,sm8650-mpss-pas
+then:
+  properties:
+interrupts:
+  minItems: 6
+interrupt-names:
+  minItems: 6
+memory-region:
+  minItems: 4
+  maxItems: 4
 
   - if:
   properties:
 compatible:
   enum:
 - qcom,sm8550-adsp-pas
+- qcom,sm8650-adsp-pas
 then:
   properties:
 power-domains:
@@ -101,6 +141,7 @@ allOf:
 compatible:
   enum:
 - qcom,sm8550-mpss-pas
+- qcom,sm8650-mpss-pas
 then:
   properties:
 power-domains:
@@ -116,6 +157,7 @@ allOf:
 compatible:
   enum:
 - qcom,sm8550-cdsp-pas
+- qcom,sm8650-cdsp-pas
 then:
   properties:
 power-domains:

-- 
2.34.1

[PATCH v4 0/3] remoteproc: qcom: Introduce DSP support for SM8650

2023-12-08 Thread Neil Armstrong

Add the bindings and driver changes for DSP support on the
SM8650 platform in order to enable the aDSP, cDSP and MPSS
subsystems to boot.

Compared to SM8550, where SM8650 uses the same dual firmware files,
(dtb file and main firmware) the memory zones requirement has changed:
- cDSP: now requires 2 memory zones to be configured as shared
  between the cDSP and the HLOS subsystem
- MPSS: In addition to the memory zone required for the SM8550
  MPSS, another one is required to be configured for MPSS
  usage only.

In order to handle this and avoid code duplication, the region_assign_*
code patch has been made more generic and is able handle multiple
DSP-only memory zones (for MPSS) or DSP-HLOS shared memory zones (cDSP)
in the same region_assign functions.

Dependencies: None

For convenience, a regularly refreshed linux-next based git tree containing
all the SM8650 related work is available at:
https://git.codelinaro.org/neil.armstrong/linux/-/tree/topic/sm8650/upstream/integ

Signed-off-by: Neil Armstrong 
---
Changes in v4:
- Collected review from Mukesh Ojha
- Fixed adsp_unassign_memory_region() as suggested by Mukesh Ojha
- Link to v3: 
https://lore.kernel.org/r/20231106-topic-sm8650-upstream-remoteproc-v3-0-dbd4cabae...@linaro.org

Changes in v3:
- Collected bindings review tags
- Small fixes suggested by Mukesh Ojha
- Link to v2: 
https://lore.kernel.org/r/20231030-topic-sm8650-upstream-remoteproc-v2-0-609ee572e...@linaro.org

Changes in v2:
- Fixed sm8650 entries in allOf:if:then to match Krzysztof's comments
- Collected reviewed-by on patch 3
- Link to v1: 
https://lore.kernel.org/r/20231025-topic-sm8650-upstream-remoteproc-v1-0-a8d20e4ce...@linaro.org

---
Neil Armstrong (3):
  dt-bindings: remoteproc: qcom,sm8550-pas: document the SM8650 PAS
  remoteproc: qcom: pas: make region assign more generic
  remoteproc: qcom: pas: Add SM8650 remoteproc support

 .../bindings/remoteproc/qcom,sm8550-pas.yaml   |  44 +-
 drivers/remoteproc/qcom_q6v5_pas.c | 150 -
 2 files changed, 159 insertions(+), 35 deletions(-)
---
base-commit: 0f5f12ac05f36f117e793656c3f560625e927f1b
change-id: 20231016-topic-sm8650-upstream-remoteproc-66a87eeb6fee

Best regards,
-- 
Neil Armstrong

Re: [PATCH v2 01/33] ftrace: Unpoison ftrace_regs in ftrace_ops_list_func()

2023-12-08 Thread Steven Rostedt

On Fri, 8 Dec 2023 15:16:10 +0100
Alexander Potapenko  wrote:

> On Tue, Nov 21, 2023 at 11:02 PM Ilya Leoshkevich  wrote:
> >
> > Architectures use assembly code to initialize ftrace_regs and call
> > ftrace_ops_list_func(). Therefore, from the KMSAN's point of view,
> > ftrace_regs is poisoned on ftrace_ops_list_func entry(). This causes
> > KMSAN warnings when running the ftrace testsuite.  
> 
> I couldn't reproduce these warnings on x86, hope you really need this
> change on s390 :)

On x86, ftrace_regs sits on the stack. And IIUC, s390 doesn't have the same
concept of a "stack" as other architectures. Perhaps that's the reason s390
needs this?

-- Steve

Re: [PATCH v2 19/33] lib/zlib: Unpoison DFLTCC output buffers

2023-12-08 Thread Alexander Potapenko

On Fri, Dec 8, 2023 at 3:14 PM Ilya Leoshkevich  wrote:
>
> On Fri, 2023-12-08 at 14:32 +0100, Alexander Potapenko wrote:
> > On Tue, Nov 21, 2023 at 11:07 PM Ilya Leoshkevich 
> > wrote:
> > >
> > > The constraints of the DFLTCC inline assembly are not precise: they
> > > do not communicate the size of the output buffers to the compiler,
> > > so
> > > it cannot automatically instrument it.
> >
> > KMSAN usually does a poor job instrumenting inline assembly.
> > Wouldn't be it better to switch to pure C ZLIB implementation, making
> > ZLIB_DFLTCC depend on !KMSAN?
>
> Normally I would agree, but the kernel DFLTCC code base is synced with
> the zlib-ng code base to the extent that it uses the zlib-ng code style
> instead of the kernel code style, and MSAN annotations are already a
> part of the zlib-ng code base. So I would prefer to keep them for
> consistency.

Hm, I didn't realize this code is being taken from elsewhere.
If so, maybe we should come up with an annotation that can be
contributed to zlib-ng, so that it doesn't cause merge conflicts every
time Mikhail is doing an update?
(leaving this up to you to decide).

If you decide to go with the current solution, please consider adding
an #include for kmsan-checks.h, which introduces
kmsan_unpoison_memory().

Re: [PATCH v2 26/33] s390/ftrace: Unpoison ftrace_regs in kprobe_ftrace_handler()

2023-12-08 Thread Alexander Potapenko

On Tue, Nov 21, 2023 at 11:02 PM Ilya Leoshkevich  wrote:
>
> s390 uses assembly code to initialize ftrace_regs and call
> kprobe_ftrace_handler(). Therefore, from the KMSAN's point of view,
> ftrace_regs is poisoned on kprobe_ftrace_handler() entry. This causes
> KMSAN warnings when running the ftrace testsuite.
>
> Fix by trusting the assembly code and always unpoisoning ftrace_regs in
> kprobe_ftrace_handler().
>
> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko

Re: [PATCH v2 01/33] ftrace: Unpoison ftrace_regs in ftrace_ops_list_func()

2023-12-08 Thread Alexander Potapenko

On Tue, Nov 21, 2023 at 11:02 PM Ilya Leoshkevich  wrote:
>
> Architectures use assembly code to initialize ftrace_regs and call
> ftrace_ops_list_func(). Therefore, from the KMSAN's point of view,
> ftrace_regs is poisoned on ftrace_ops_list_func entry(). This causes
> KMSAN warnings when running the ftrace testsuite.

I couldn't reproduce these warnings on x86, hope you really need this
change on s390 :)

> Fix by trusting the architecture-specific assembly code and always
> unpoisoning ftrace_regs in ftrace_ops_list_func.
>
> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko

Re: [PATCH v2 19/33] lib/zlib: Unpoison DFLTCC output buffers

2023-12-08 Thread Ilya Leoshkevich

On Fri, 2023-12-08 at 14:32 +0100, Alexander Potapenko wrote:
> On Tue, Nov 21, 2023 at 11:07 PM Ilya Leoshkevich 
> wrote:
> > 
> > The constraints of the DFLTCC inline assembly are not precise: they
> > do not communicate the size of the output buffers to the compiler,
> > so
> > it cannot automatically instrument it.
> 
> KMSAN usually does a poor job instrumenting inline assembly.
> Wouldn't be it better to switch to pure C ZLIB implementation, making
> ZLIB_DFLTCC depend on !KMSAN?

Normally I would agree, but the kernel DFLTCC code base is synced with
the zlib-ng code base to the extent that it uses the zlib-ng code style
instead of the kernel code style, and MSAN annotations are already a
part of the zlib-ng code base. So I would prefer to keep them for
consistency.

The code is also somewhat tricky in the are of buffer management, so I
find it beneficial to have it checked for uninitialized memory
accesses.

Re: [PATCH v2 13/33] kmsan: Introduce memset_no_sanitize_memory()

2023-12-08 Thread Ilya Leoshkevich

On Fri, 2023-12-08 at 14:48 +0100, Alexander Potapenko wrote:
> On Tue, Nov 21, 2023 at 11:06 PM Ilya Leoshkevich 
> wrote:
> > 
> > Add a wrapper for memset() that prevents unpoisoning.
> 
> We have __memset() already, won't it work for this case?

A problem with __memset() is that, at least for me, it always ends
up being a call. There is a use case where we need to write only 1
byte, so I thought that introducing a call there (when compiling
without KMSAN) would be unacceptable.

> On the other hand, I am not sure you want to preserve the redzone in
> its previous state (unless it's known to be poisoned).

That's exactly the problem with unpoisoning: it removes the distinction
between a new allocation and a UAF.

> You might consider explicitly unpoisoning the redzone instead.

That was my first attempt, but it resulted in test failures due to the
above.

> ...
> 
> > +__no_sanitize_memory
> > +static inline void *memset_no_sanitize_memory(void *s, int c,
> > size_t n)
> > +{
> > +   return memset(s, c, n);
> > +}
> 
> I think depending on the compiler optimizations this might end up
> being a call to normal memset, that would still change the shadow
> bytes.

Interesting, do you have some specific scenario in mind? I vaguely
remember that in the past there were cases when sanitizer annotations
were lost after inlining, but I thought they were sorted out?

And, in any case, if this were to happen, would not it be considered a
compiler bug that needs fixing there, and not in the kernel?

Re: [PATCH v2 17/33] mm: kfence: Disable KMSAN when checking the canary

2023-12-08 Thread Alexander Potapenko

On Fri, Dec 8, 2023 at 1:53 PM Alexander Potapenko  wrote:
>
> On Tue, Nov 21, 2023 at 11:02 PM Ilya Leoshkevich  wrote:
> >
> > KMSAN warns about check_canary() accessing the canary.
> >
> > The reason is that, even though set_canary() is properly instrumented
> > and sets shadow, slub explicitly poisons the canary's address range
> > afterwards.
> >
> > Unpoisoning the canary is not the right thing to do: only
> > check_canary() is supposed to ever touch it. Instead, disable KMSAN
> > checks around canary read accesses.
> >
> > Signed-off-by: Ilya Leoshkevich 
> Reviewed-by: Alexander Potapenko 

and even

Tested-by: Alexander Potapenko

Re: [PATCH v2 14/33] kmsan: Support SLAB_POISON

2023-12-08 Thread Alexander Potapenko

On Tue, Nov 21, 2023 at 11:02 PM Ilya Leoshkevich  wrote:
>
> Avoid false KMSAN negatives with SLUB_DEBUG by allowing
> kmsan_slab_free() to poison the freed memory, and by preventing
> init_object() from unpoisoning new allocations. The usage of
> memset_no_sanitize_memory() does not degrade the generated code
> quality.
>
> There are two alternatives to this approach. First, init_object()
> can be marked with __no_sanitize_memory. This annotation should be used
> with great care, because it drops all instrumentation from the
> function, and any shadow writes will be lost. Even though this is not a
> concern with the current init_object() implementation, this may change
> in the future.
>
> Second, kmsan_poison_memory() calls may be added after memset() calls.
> The downside is that init_object() is called from
> free_debug_processing(), in which case poisoning will erase the
> distinction between simply uninitialized memory and UAF.
>
> Signed-off-by: Ilya Leoshkevich 
> ---
>  mm/kmsan/hooks.c |  2 +-
>  mm/slub.c| 10 ++
>  2 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
> index 7b5814412e9f..7a30274b893c 100644
> --- a/mm/kmsan/hooks.c
> +++ b/mm/kmsan/hooks.c
> @@ -76,7 +76,7 @@ void kmsan_slab_free(struct kmem_cache *s, void *object)
> return;
>
> /* RCU slabs could be legally used after free within the RCU period */
> -   if (unlikely(s->flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)))
> +   if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU))
> return;
> /*
>  * If there's a constructor, freed memory must remain in the same 
> state
> diff --git a/mm/slub.c b/mm/slub.c
> index 63d281dfacdb..169e5f645ea8 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1030,7 +1030,8 @@ static void init_object(struct kmem_cache *s, void 
> *object, u8 val)
> unsigned int poison_size = s->object_size;
>
> if (s->flags & SLAB_RED_ZONE) {
> -   memset(p - s->red_left_pad, val, s->red_left_pad);
> +   memset_no_sanitize_memory(p - s->red_left_pad, val,

As I wrote in patch 13/33, let's try to use __memset() here (with a
comment that we want to preserve the previously poisoned memory)

Re: [PATCH v2 13/33] kmsan: Introduce memset_no_sanitize_memory()

2023-12-08 Thread Alexander Potapenko

On Tue, Nov 21, 2023 at 11:06 PM Ilya Leoshkevich  wrote:
>
> Add a wrapper for memset() that prevents unpoisoning.

We have __memset() already, won't it work for this case?
On the other hand, I am not sure you want to preserve the redzone in
its previous state (unless it's known to be poisoned).
You might consider explicitly unpoisoning the redzone instead.

...

> +__no_sanitize_memory
> +static inline void *memset_no_sanitize_memory(void *s, int c, size_t n)
> +{
> +   return memset(s, c, n);
> +}

I think depending on the compiler optimizations this might end up
being a call to normal memset, that would still change the shadow
bytes.

Re: [PATCH v2 24/33] s390/checksum: Add a KMSAN check

2023-12-08 Thread Alexander Potapenko

On Tue, Nov 21, 2023 at 11:02 PM Ilya Leoshkevich  wrote:
>
> Add a KMSAN check to the CKSM inline assembly, similar to how it was
> done for ASAN in commit e42ac7789df6 ("s390/checksum: always use cksm
> instruction").
>
> Acked-by: Alexander Gordeev 
> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko

Re: [PATCH v2 19/33] lib/zlib: Unpoison DFLTCC output buffers

2023-12-08 Thread Alexander Potapenko

On Tue, Nov 21, 2023 at 11:07 PM Ilya Leoshkevich  wrote:
>
> The constraints of the DFLTCC inline assembly are not precise: they
> do not communicate the size of the output buffers to the compiler, so
> it cannot automatically instrument it.

KMSAN usually does a poor job instrumenting inline assembly.
Wouldn't be it better to switch to pure C ZLIB implementation, making
ZLIB_DFLTCC depend on !KMSAN?

Re: [PATCH v4 4/4] vduse: Add LSM hooks to check Virtio device type

2023-12-08 Thread Maxime Coquelin





On 12/8/23 13:26, Michael S. Tsirkin wrote:

On Fri, Dec 08, 2023 at 01:23:00PM +0100, Maxime Coquelin wrote:



On 12/8/23 12:05, Michael S. Tsirkin wrote:

On Fri, Dec 08, 2023 at 12:01:15PM +0100, Maxime Coquelin wrote:

Hello Paul,

On 11/8/23 03:31, Paul Moore wrote:

On Oct 20, 2023 "Michael S. Tsirkin"  wrote:


This patch introduces LSM hooks for devices creation,
destruction and opening operations, checking the
application is allowed to perform these operations for
the Virtio device type.

Signed-off-by: Maxime Coquelin 
---
drivers/vdpa/vdpa_user/vduse_dev.c  | 12 +++
include/linux/lsm_hook_defs.h   |  4 +++
include/linux/security.h| 15 
security/security.c | 42 ++
security/selinux/hooks.c| 55 +
security/selinux/include/classmap.h |  2 ++
6 files changed, 130 insertions(+)


My apologies for the late reply, I've been trying to work my way through
the review backlog but it has been taking longer than expected; comments
below ...


No worries, I have also been busy these days.


diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 2aa0e219d721..65d9262a37f7 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -21,6 +21,7 @@
 *  Copyright (C) 2016 Mellanox Technologies
 */
+#include "av_permissions.h"
#include 
#include 
#include 
@@ -92,6 +93,7 @@
#include 
#include 
#include 
+#include 
#include "avc.h"
#include "objsec.h"
@@ -6950,6 +6952,56 @@ static int selinux_uring_cmd(struct io_uring_cmd *ioucmd)
}
#endif /* CONFIG_IO_URING */
+static int vduse_check_device_type(u32 sid, u32 device_id)
+{
+   u32 requested;
+
+   if (device_id == VIRTIO_ID_NET)
+   requested = VDUSE__NET;
+   else if (device_id == VIRTIO_ID_BLOCK)
+   requested = VDUSE__BLOCK;
+   else
+   return -EINVAL;
+
+   return avc_has_perm(sid, sid, SECCLASS_VDUSE, requested, NULL);
+}
+
+static int selinux_vduse_dev_create(u32 device_id)
+{
+   u32 sid = current_sid();
+   int ret;
+
+   ret = avc_has_perm(sid, sid, SECCLASS_VDUSE, VDUSE__DEVCREATE, NULL);
+   if (ret)
+   return ret;
+
+   return vduse_check_device_type(sid, device_id);
+}


I see there has been some discussion about the need for a dedicated
create hook as opposed to using the existing ioctl controls.  I think
one important point that has been missing from the discussion is the
idea of labeling the newly created device.  Unfortunately prior to a
few minutes ago I hadn't ever looked at VDUSE so please correct me if
I get some things wrong :)

   From what I can see userspace creates a new VDUSE device with
ioctl(VDUSE_CREATE_DEV), which trigger the creation of a new
/dev/vduse/XXX device which will be labeled according to the udev
and SELinux configuration, likely with a generic udev label.  My
question is if we want to be able to uniquely label each VDUSE
device based on the process that initiates the device creation
with the call to ioctl()?  If that is the case, we would need a
create hook not only to control the creation of the device, but to
record the triggering process' label in the new device; this label
would then be used in subsequent VDUSE open and destroy operations.
The normal device file I/O operations would still be subject to the
standard SELinux file I/O permissions using the device file label
assigned by systemd/udev when the device was created.


I don't think we need a unique label for VDUSE devices, but maybe
Michael thinks otherwise?


I don't know.
All this is consumed by libvirt, you need to ask these guys.


I think it is not consumed by libvirt, at least not in the usecases I
have in mind. For networking devices, it will be consumed by OVS.

Maxime


OK, ovs then :)



Adding Aaron, our go-to person for SELinux-related topics for OVS, but I 
think we don't need to do relabeling for VDUSE chardevs.


Aaron, do you confirm?

Maxime

Re: [PATCH v5 4/5] x86/paravirt: switch mixed paravirt/alternative calls to alternative_2

2023-12-08 Thread Borislav Petkov

On Fri, Dec 08, 2023 at 12:53:47PM +0100, Juergen Gross wrote:
> Took me a while to find it. Patch 5 was repairing the issue again

Can't have that. Any patch in any set must build and boot successfully
for bisection reasons.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2 17/33] mm: kfence: Disable KMSAN when checking the canary

2023-12-08 Thread Alexander Potapenko

On Tue, Nov 21, 2023 at 11:02 PM Ilya Leoshkevich  wrote:
>
> KMSAN warns about check_canary() accessing the canary.
>
> The reason is that, even though set_canary() is properly instrumented
> and sets shadow, slub explicitly poisons the canary's address range
> afterwards.
>
> Unpoisoning the canary is not the right thing to do: only
> check_canary() is supposed to ever touch it. Instead, disable KMSAN
> checks around canary read accesses.
>
> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko

Re: [PATCH v4 4/4] vduse: Add LSM hooks to check Virtio device type

2023-12-08 Thread Michael S. Tsirkin

On Fri, Dec 08, 2023 at 01:23:00PM +0100, Maxime Coquelin wrote:
> 
> 
> On 12/8/23 12:05, Michael S. Tsirkin wrote:
> > On Fri, Dec 08, 2023 at 12:01:15PM +0100, Maxime Coquelin wrote:
> > > Hello Paul,
> > > 
> > > On 11/8/23 03:31, Paul Moore wrote:
> > > > On Oct 20, 2023 "Michael S. Tsirkin"  wrote:
> > > > > 
> > > > > This patch introduces LSM hooks for devices creation,
> > > > > destruction and opening operations, checking the
> > > > > application is allowed to perform these operations for
> > > > > the Virtio device type.
> > > > > 
> > > > > Signed-off-by: Maxime Coquelin 
> > > > > ---
> > > > >drivers/vdpa/vdpa_user/vduse_dev.c  | 12 +++
> > > > >include/linux/lsm_hook_defs.h   |  4 +++
> > > > >include/linux/security.h| 15 
> > > > >security/security.c | 42 ++
> > > > >security/selinux/hooks.c| 55 
> > > > > +
> > > > >security/selinux/include/classmap.h |  2 ++
> > > > >6 files changed, 130 insertions(+)
> > > > 
> > > > My apologies for the late reply, I've been trying to work my way through
> > > > the review backlog but it has been taking longer than expected; comments
> > > > below ...
> > > 
> > > No worries, I have also been busy these days.
> > > 
> > > > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > > > > index 2aa0e219d721..65d9262a37f7 100644
> > > > > --- a/security/selinux/hooks.c
> > > > > +++ b/security/selinux/hooks.c
> > > > > @@ -21,6 +21,7 @@
> > > > > *  Copyright (C) 2016 Mellanox Technologies
> > > > > */
> > > > > +#include "av_permissions.h"
> > > > >#include 
> > > > >#include 
> > > > >#include 
> > > > > @@ -92,6 +93,7 @@
> > > > >#include 
> > > > >#include 
> > > > >#include 
> > > > > +#include 
> > > > >#include "avc.h"
> > > > >#include "objsec.h"
> > > > > @@ -6950,6 +6952,56 @@ static int selinux_uring_cmd(struct 
> > > > > io_uring_cmd *ioucmd)
> > > > >}
> > > > >#endif /* CONFIG_IO_URING */
> > > > > +static int vduse_check_device_type(u32 sid, u32 device_id)
> > > > > +{
> > > > > + u32 requested;
> > > > > +
> > > > > + if (device_id == VIRTIO_ID_NET)
> > > > > + requested = VDUSE__NET;
> > > > > + else if (device_id == VIRTIO_ID_BLOCK)
> > > > > + requested = VDUSE__BLOCK;
> > > > > + else
> > > > > + return -EINVAL;
> > > > > +
> > > > > + return avc_has_perm(sid, sid, SECCLASS_VDUSE, requested, NULL);
> > > > > +}
> > > > > +
> > > > > +static int selinux_vduse_dev_create(u32 device_id)
> > > > > +{
> > > > > + u32 sid = current_sid();
> > > > > + int ret;
> > > > > +
> > > > > + ret = avc_has_perm(sid, sid, SECCLASS_VDUSE, VDUSE__DEVCREATE, 
> > > > > NULL);
> > > > > + if (ret)
> > > > > + return ret;
> > > > > +
> > > > > + return vduse_check_device_type(sid, device_id);
> > > > > +}
> > > > 
> > > > I see there has been some discussion about the need for a dedicated
> > > > create hook as opposed to using the existing ioctl controls.  I think
> > > > one important point that has been missing from the discussion is the
> > > > idea of labeling the newly created device.  Unfortunately prior to a
> > > > few minutes ago I hadn't ever looked at VDUSE so please correct me if
> > > > I get some things wrong :)
> > > > 
> > > >   From what I can see userspace creates a new VDUSE device with
> > > > ioctl(VDUSE_CREATE_DEV), which trigger the creation of a new
> > > > /dev/vduse/XXX device which will be labeled according to the udev
> > > > and SELinux configuration, likely with a generic udev label.  My
> > > > question is if we want to be able to uniquely label each VDUSE
> > > > device based on the process that initiates the device creation
> > > > with the call to ioctl()?  If that is the case, we would need a
> > > > create hook not only to control the creation of the device, but to
> > > > record the triggering process' label in the new device; this label
> > > > would then be used in subsequent VDUSE open and destroy operations.
> > > > The normal device file I/O operations would still be subject to the
> > > > standard SELinux file I/O permissions using the device file label
> > > > assigned by systemd/udev when the device was created.
> > > 
> > > I don't think we need a unique label for VDUSE devices, but maybe
> > > Michael thinks otherwise?
> > 
> > I don't know.
> > All this is consumed by libvirt, you need to ask these guys.
> 
> I think it is not consumed by libvirt, at least not in the usecases I
> have in mind. For networking devices, it will be consumed by OVS.
> 
> Maxime

OK, ovs then :)

-- 
MST

Re: [PATCH v4 4/4] vduse: Add LSM hooks to check Virtio device type

2023-12-08 Thread Maxime Coquelin





On 12/8/23 12:05, Michael S. Tsirkin wrote:

On Fri, Dec 08, 2023 at 12:01:15PM +0100, Maxime Coquelin wrote:

Hello Paul,

On 11/8/23 03:31, Paul Moore wrote:

On Oct 20, 2023 "Michael S. Tsirkin"  wrote:


This patch introduces LSM hooks for devices creation,
destruction and opening operations, checking the
application is allowed to perform these operations for
the Virtio device type.

Signed-off-by: Maxime Coquelin 
---
   drivers/vdpa/vdpa_user/vduse_dev.c  | 12 +++
   include/linux/lsm_hook_defs.h   |  4 +++
   include/linux/security.h| 15 
   security/security.c | 42 ++
   security/selinux/hooks.c| 55 +
   security/selinux/include/classmap.h |  2 ++
   6 files changed, 130 insertions(+)


My apologies for the late reply, I've been trying to work my way through
the review backlog but it has been taking longer than expected; comments
below ...


No worries, I have also been busy these days.


diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 2aa0e219d721..65d9262a37f7 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -21,6 +21,7 @@
*  Copyright (C) 2016 Mellanox Technologies
*/
+#include "av_permissions.h"
   #include 
   #include 
   #include 
@@ -92,6 +93,7 @@
   #include 
   #include 
   #include 
+#include 
   #include "avc.h"
   #include "objsec.h"
@@ -6950,6 +6952,56 @@ static int selinux_uring_cmd(struct io_uring_cmd *ioucmd)
   }
   #endif /* CONFIG_IO_URING */
+static int vduse_check_device_type(u32 sid, u32 device_id)
+{
+   u32 requested;
+
+   if (device_id == VIRTIO_ID_NET)
+   requested = VDUSE__NET;
+   else if (device_id == VIRTIO_ID_BLOCK)
+   requested = VDUSE__BLOCK;
+   else
+   return -EINVAL;
+
+   return avc_has_perm(sid, sid, SECCLASS_VDUSE, requested, NULL);
+}
+
+static int selinux_vduse_dev_create(u32 device_id)
+{
+   u32 sid = current_sid();
+   int ret;
+
+   ret = avc_has_perm(sid, sid, SECCLASS_VDUSE, VDUSE__DEVCREATE, NULL);
+   if (ret)
+   return ret;
+
+   return vduse_check_device_type(sid, device_id);
+}


I see there has been some discussion about the need for a dedicated
create hook as opposed to using the existing ioctl controls.  I think
one important point that has been missing from the discussion is the
idea of labeling the newly created device.  Unfortunately prior to a
few minutes ago I hadn't ever looked at VDUSE so please correct me if
I get some things wrong :)

  From what I can see userspace creates a new VDUSE device with
ioctl(VDUSE_CREATE_DEV), which trigger the creation of a new
/dev/vduse/XXX device which will be labeled according to the udev
and SELinux configuration, likely with a generic udev label.  My
question is if we want to be able to uniquely label each VDUSE
device based on the process that initiates the device creation
with the call to ioctl()?  If that is the case, we would need a
create hook not only to control the creation of the device, but to
record the triggering process' label in the new device; this label
would then be used in subsequent VDUSE open and destroy operations.
The normal device file I/O operations would still be subject to the
standard SELinux file I/O permissions using the device file label
assigned by systemd/udev when the device was created.


I don't think we need a unique label for VDUSE devices, but maybe
Michael thinks otherwise?


I don't know.
All this is consumed by libvirt, you need to ask these guys.


I think it is not consumed by libvirt, at least not in the usecases I
have in mind. For networking devices, it will be consumed by OVS.

Maxime

Re: [PATCH v5 4/5] x86/paravirt: switch mixed paravirt/alternative calls to alternative_2

2023-12-08 Thread Juergen Gross


On 06.12.23 12:08, Borislav Petkov wrote:

On Wed, Nov 29, 2023 at 02:33:31PM +0100, Juergen Gross wrote:

Instead of stacking alternative and paravirt patching, use the new
ALT_FLAG_CALL flag to switch those mixed calls to pure alternative
handling.

This eliminates the need to be careful regarding the sequence of
alternative and paravirt patching.

Signed-off-by: Juergen Gross 
Acked-by: Peter Zijlstra (Intel) 
---
V5:
- remove no longer needed extern declarations from alternative.c
   (Boris Petkov)
- add comment about ALTERNATIVE[_2]() macro usage (Boris Petkov)
- rebase to tip/master (Boris Petkov)
---
  arch/x86/include/asm/alternative.h|  5 ++--
  arch/x86/include/asm/paravirt.h   |  9 --
  arch/x86/include/asm/paravirt_types.h | 40 +++
  arch/x86/kernel/alternative.c |  1 -
  arch/x86/kernel/callthunks.c  | 17 ++--
  arch/x86/kernel/module.c  | 20 --
  6 files changed, 44 insertions(+), 48 deletions(-)


After this one: (.config is attached).

...

Ouch.

Took me a while to find it. Patch 5 was repairing the issue again, and I tested
more thoroughly only with all 5 patches applied.


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: Re: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)

2023-12-08 Thread Tobias Huschle

On Fri, Dec 08, 2023 at 05:31:18AM -0500, Michael S. Tsirkin wrote:
> On Fri, Dec 08, 2023 at 10:24:16AM +0100, Tobias Huschle wrote:
> > On Thu, Dec 07, 2023 at 01:48:40AM -0500, Michael S. Tsirkin wrote:
> > > On Thu, Dec 07, 2023 at 07:22:12AM +0100, Tobias Huschle wrote:
> > > > 3. vhost looping endlessly, waiting for kworker to be scheduled
> > > > 
> > > > I dug a little deeper on what the vhost is doing. I'm not an expert on
> > > > virtio whatsoever, so these are just educated guesses that maybe
> > > > someone can verify/correct. Please bear with me probably messing up 
> > > > the terminology.
> > > > 
> > > > - vhost is looping through available queues.
> > > > - vhost wants to wake up a kworker to process a found queue.
> > > > - kworker does something with that queue and terminates quickly.
> > > > 
> > > > What I found by throwing in some very noisy trace statements was that,
> > > > if the kworker is not woken up, the vhost just keeps looping accross
> > > > all available queues (and seems to repeat itself). So it essentially
> > > > relies on the scheduler to schedule the kworker fast enough. Otherwise
> > > > it will just keep on looping until it is migrated off the CPU.
> > > 
> > > 
> > > Normally it takes the buffers off the queue and is done with it.
> > > I am guessing that at the same time guest is running on some other
> > > CPU and keeps adding available buffers?
> > > 
> > 
> > It seems to do just that, there are multiple other vhost instances
> > involved which might keep filling up thoses queues. 
> > 
> 
> No vhost is ever only draining queues. Guest is filling them.
> 
> > Unfortunately, this makes the problematic vhost instance to stay on
> > the CPU and prevents said kworker to get scheduled. The kworker is
> > explicitly woken up by vhost, so it wants it to do something.
> > 
> > At this point it seems that there is an assumption about the scheduler
> > in place which is no longer fulfilled by EEVDF. From the discussion so
> > far, it seems like EEVDF does what is intended to do.
> > 
> > Shouldn't there be a more explicit mechanism in use that allows the
> > kworker to be scheduled in favor of the vhost?
> > 
> > It is also concerning that the vhost seems cannot be preempted by the
> > scheduler while executing that loop.
> 
> 
> Which loop is that, exactly?

The loop continously passes translate_desc in drivers/vhost/vhost.c
That's where I put the trace statements.

The overall sequence seems to be (top to bottom):

handle_rx
get_rx_bufs
vhost_get_vq_desc
vhost_get_avail_head
vhost_get_avail
__vhost_get_user_slow
translate_desc   << trace statement in here
vhost_iotlb_itree_first

These functions show up as having increased overhead in perf.

There are multiple loops going on in there.
Again the disclaimer though, I'm not familiar with that code at all.

Re: [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior

2023-12-08 Thread David Hildenbrand


On 07.12.23 05:36, Vishal Verma wrote:

Add a sysfs knob for dax devices to control the memmap_on_memory setting
if the dax device were to be hotplugged as system memory.

The default memmap_on_memory setting for dax devices originating via
pmem or hmem is set to 'false' - i.e. no memmap_on_memory semantics, to
preserve legacy behavior. For dax devices via CXL, the default is on.
The sysfs control allows the administrator to override the above
defaults if needed.

Cc: David Hildenbrand 
Cc: Dan Williams 
Cc: Dave Jiang 
Cc: Dave Hansen 
Cc: Huang Ying 
Reviewed-by: Jonathan Cameron 
Reviewed-by: David Hildenbrand 
Signed-off-by: Vishal Verma 
---
  drivers/dax/bus.c   | 40 +
  Documentation/ABI/testing/sysfs-bus-dax | 13 +++
  2 files changed, 53 insertions(+)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 1ff1ab5fa105..11abb57cc031 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1270,6 +1270,45 @@ static ssize_t numa_node_show(struct device *dev,
  }
  static DEVICE_ATTR_RO(numa_node);
  
+static ssize_t memmap_on_memory_show(struct device *dev,

+struct device_attribute *attr, char *buf)
+{
+   struct dev_dax *dev_dax = to_dev_dax(dev);
+
+   return sprintf(buf, "%d\n", dev_dax->memmap_on_memory);
+}
+
+static ssize_t memmap_on_memory_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+   struct dev_dax *dev_dax = to_dev_dax(dev);
+   struct dax_region *dax_region = dev_dax->region;
+   ssize_t rc;
+   bool val;
+
+   rc = kstrtobool(buf, );
+   if (rc)
+   return rc;
+
+   if (dev_dax->memmap_on_memory == val)
+   return len;
+
+   device_lock(dax_region->dev);
+   if (!dax_region->dev->driver) {
+   device_unlock(dax_region->dev);
+   return -ENXIO;
+   }
+
+   device_lock(dev);
+   dev_dax->memmap_on_memory = val;
+   device_unlock(dev);
+
+   device_unlock(dax_region->dev);
+   return rc == 0 ? len : rc;
+}
+static DEVICE_ATTR_RW(memmap_on_memory);
+
  static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int 
n)
  {
struct device *dev = container_of(kobj, struct device, kobj);
@@ -1296,6 +1335,7 @@ static struct attribute *dev_dax_attributes[] = {
_attr_align.attr,
_attr_resource.attr,
_attr_numa_node.attr,
+   _attr_memmap_on_memory.attr,
NULL,
  };
  
diff --git a/Documentation/ABI/testing/sysfs-bus-dax b/Documentation/ABI/testing/sysfs-bus-dax

index a61a7b186017..bb063a004e41 100644
--- a/Documentation/ABI/testing/sysfs-bus-dax
+++ b/Documentation/ABI/testing/sysfs-bus-dax
@@ -149,3 +149,16 @@ KernelVersion: v5.1
  Contact:  nvd...@lists.linux.dev
  Description:
(RO) The id attribute indicates the region id of a dax region.
+
+What:  /sys/bus/dax/devices/daxX.Y/memmap_on_memory
+Date:  October, 2023
+KernelVersion: v6.8
+Contact:   nvd...@lists.linux.dev
+Description:
+   (RW) Control the memmap_on_memory setting if the dax device
+   were to be hotplugged as system memory. This determines whether
+   the 'altmap' for the hotplugged memory will be placed on the
+   device being hotplugged (memmap_on+memory=1) or if it will be
+   placed on regular memory (memmap_on_memory=0). This attribute
+   must be set before the device is handed over to the 'kmem'
+   driver (i.e.  hotplugged into system-ram).



Should we note the dependency on other factors as given in 
mhp_supports_memmap_on_memory(), especially, the system-wide setting and 
some weird kernel configurations?


--
Cheers,

David / dhildenb

Re: [PATCH] vdpa: Fix an error handling path in eni_vdpa_probe()

2023-12-08 Thread Michael S. Tsirkin

On Thu, Dec 07, 2023 at 10:13:51PM +0100, Christophe JAILLET wrote:
> Le 20/10/2022 à 21:21, Christophe JAILLET a écrit :
> > After a successful vp_legacy_probe() call, vp_legacy_remove() should be
> > called in the error handling path, as already done in the remove function.
> > 
> > Add the missing call.
> > 
> > Fixes: e85087beedca ("eni_vdpa: add vDPA driver for Alibaba ENI")
> > Signed-off-by: Christophe JAILLET 
> > ---
> >   drivers/vdpa/alibaba/eni_vdpa.c | 6 --
> >   1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/vdpa/alibaba/eni_vdpa.c 
> > b/drivers/vdpa/alibaba/eni_vdpa.c
> > index 5a09a09cca70..cce3d1837104 100644
> > --- a/drivers/vdpa/alibaba/eni_vdpa.c
> > +++ b/drivers/vdpa/alibaba/eni_vdpa.c
> > @@ -497,7 +497,7 @@ static int eni_vdpa_probe(struct pci_dev *pdev, const 
> > struct pci_device_id *id)
> > if (!eni_vdpa->vring) {
> > ret = -ENOMEM;
> > ENI_ERR(pdev, "failed to allocate virtqueues\n");
> > -   goto err;
> > +   goto err_remove_vp_legacy;
> > }
> > for (i = 0; i < eni_vdpa->queues; i++) {
> > @@ -509,11 +509,13 @@ static int eni_vdpa_probe(struct pci_dev *pdev, const 
> > struct pci_device_id *id)
> > ret = vdpa_register_device(_vdpa->vdpa, eni_vdpa->queues);
> > if (ret) {
> > ENI_ERR(pdev, "failed to register to vdpa bus\n");
> > -   goto err;
> > +   goto err_remove_vp_legacy;
> > }
> > return 0;
> > +err_remove_vp_legacy:
> > +   vp_legacy_remove(_vdpa->ldev);
> >   err:
> > put_device(_vdpa->vdpa.dev);
> > return ret;
> 
> Polite reminder on a (very) old patch.
> 
> CJ


Tagged now, thanks!

Re: [PATCH v3 3/5] remoteproc: k3-r5: Add support for IPC-only mode for all R5Fs

2023-12-08 Thread Hari Nagalla


On 11/2/23 11:43, Jan Kiszka wrote:

RTI1 watchdog also powers up R5F core 1. And this could happen either in

When writing "... also powers up...", other than R5F core 1, what else is being
powered?

Would be a question for the SoC vendor - I assumed that only mcu_rti1
[1] goes on when enabling it. But also mcu_r5fss0_core1 is enabled after
that, at least according to the respective TI-SCI query that the is_on
handler is performing. I've tested that under Linux and in U-Boot.

As described in section 12.5.2.1 of AM64x TRM 
(https://www.ti.com/lit/pdf/SPRUIM2) -There is a RTI for each CPU core. 
And it is not intended to be use RTI provisioned for a particular CPU 
core with a different core.
And also as shown in section (5.2.2.2.1.3.1) the CPU core and 
corresponding RTI share the same power sub module (LPSC), so enabling 
one powers on the other.


As Suman suggested, it seems more appropriate to enable the RTI watchdog 
timers in the remoteproc driver. Legacy omap remoteproc drivers have 
this support and needs to be extended to k3 remoteproc drivers.
Another option could be to DEFER RTI probe until corresponding 
remoteproc driver is probed.


Any other solutions to maintain this order of enabling remote core and 
the corresponding RTI/WDT?

Re: [PATCH v2] ring-buffer: Add offset of events in dump on mismatch

2023-12-08 Thread Google

On Thu, 7 Dec 2023 17:31:08 -0500
Steven Rostedt  wrote:

> From: "Steven Rostedt (Google)" 
> 
> On bugs that have the ring buffer timestamp get out of sync, the config
> CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS, that checks for it and if it is
> detected it causes a dump of the bad sub buffer.
> 
> It shows each event and their timestamp as well as the delta in the event.
> But it's also good to see the offset into the subbuffer for that event to
> know if how close to the end it is.
> 
> Signed-off-by: Steven Rostedt (Google) 

Looks good to me.

Acked-by: Masami Hiramatsu (Google) 

Thanks!

> ---
> Changes since v1: 
> https://lore.kernel.org/linux-trace-kernel/20231207171613.05920...@gandalf.local.home
> 
> - Add "0x" before the "%x" to make the print less confusing (Mathieu 
> Desnoyers)
> 
>  kernel/trace/ring_buffer.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 8d2a4f00eca9..09d4b1aa8820 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -3424,23 +3424,27 @@ static void dump_buffer_page(struct buffer_data_page 
> *bpage,
>   case RINGBUF_TYPE_TIME_EXTEND:
>   delta = rb_event_time_stamp(event);
>   ts += delta;
> - pr_warn("  [%lld] delta:%lld TIME EXTEND\n", ts, delta);
> + pr_warn(" 0x%x: [%lld] delta:%lld TIME EXTEND\n",
> + e, ts, delta);
>   break;
>  
>   case RINGBUF_TYPE_TIME_STAMP:
>   delta = rb_event_time_stamp(event);
>   ts = rb_fix_abs_ts(delta, ts);
> - pr_warn("  [%lld] absolute:%lld TIME STAMP\n", ts, 
> delta);
> + pr_warn(" 0x%x:  [%lld] absolute:%lld TIME STAMP\n",
> + e, ts, delta);
>   break;
>  
>   case RINGBUF_TYPE_PADDING:
>   ts += event->time_delta;
> - pr_warn("  [%lld] delta:%d PADDING\n", ts, 
> event->time_delta);
> + pr_warn(" 0x%x:  [%lld] delta:%d PADDING\n",
> + e, ts, event->time_delta);
>   break;
>  
>   case RINGBUF_TYPE_DATA:
>   ts += event->time_delta;
> - pr_warn("  [%lld] delta:%d\n", ts, event->time_delta);
> + pr_warn(" 0x%x:  [%lld] delta:%d\n",
> + e, ts, event->time_delta);
>   break;
>  
>   default:
> -- 
> 2.42.0
> 


-- 
Masami Hiramatsu (Google)

Re: [PATCH v4 4/4] vduse: Add LSM hooks to check Virtio device type

2023-12-08 Thread Michael S. Tsirkin

On Fri, Dec 08, 2023 at 12:01:15PM +0100, Maxime Coquelin wrote:
> Hello Paul,
> 
> On 11/8/23 03:31, Paul Moore wrote:
> > On Oct 20, 2023 "Michael S. Tsirkin"  wrote:
> > > 
> > > This patch introduces LSM hooks for devices creation,
> > > destruction and opening operations, checking the
> > > application is allowed to perform these operations for
> > > the Virtio device type.
> > > 
> > > Signed-off-by: Maxime Coquelin 
> > > ---
> > >   drivers/vdpa/vdpa_user/vduse_dev.c  | 12 +++
> > >   include/linux/lsm_hook_defs.h   |  4 +++
> > >   include/linux/security.h| 15 
> > >   security/security.c | 42 ++
> > >   security/selinux/hooks.c| 55 +
> > >   security/selinux/include/classmap.h |  2 ++
> > >   6 files changed, 130 insertions(+)
> > 
> > My apologies for the late reply, I've been trying to work my way through
> > the review backlog but it has been taking longer than expected; comments
> > below ...
> 
> No worries, I have also been busy these days.
> 
> > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > > index 2aa0e219d721..65d9262a37f7 100644
> > > --- a/security/selinux/hooks.c
> > > +++ b/security/selinux/hooks.c
> > > @@ -21,6 +21,7 @@
> > >*  Copyright (C) 2016 Mellanox Technologies
> > >*/
> > > +#include "av_permissions.h"
> > >   #include 
> > >   #include 
> > >   #include 
> > > @@ -92,6 +93,7 @@
> > >   #include 
> > >   #include 
> > >   #include 
> > > +#include 
> > >   #include "avc.h"
> > >   #include "objsec.h"
> > > @@ -6950,6 +6952,56 @@ static int selinux_uring_cmd(struct io_uring_cmd 
> > > *ioucmd)
> > >   }
> > >   #endif /* CONFIG_IO_URING */
> > > +static int vduse_check_device_type(u32 sid, u32 device_id)
> > > +{
> > > + u32 requested;
> > > +
> > > + if (device_id == VIRTIO_ID_NET)
> > > + requested = VDUSE__NET;
> > > + else if (device_id == VIRTIO_ID_BLOCK)
> > > + requested = VDUSE__BLOCK;
> > > + else
> > > + return -EINVAL;
> > > +
> > > + return avc_has_perm(sid, sid, SECCLASS_VDUSE, requested, NULL);
> > > +}
> > > +
> > > +static int selinux_vduse_dev_create(u32 device_id)
> > > +{
> > > + u32 sid = current_sid();
> > > + int ret;
> > > +
> > > + ret = avc_has_perm(sid, sid, SECCLASS_VDUSE, VDUSE__DEVCREATE, NULL);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + return vduse_check_device_type(sid, device_id);
> > > +}
> > 
> > I see there has been some discussion about the need for a dedicated
> > create hook as opposed to using the existing ioctl controls.  I think
> > one important point that has been missing from the discussion is the
> > idea of labeling the newly created device.  Unfortunately prior to a
> > few minutes ago I hadn't ever looked at VDUSE so please correct me if
> > I get some things wrong :)
> > 
> >  From what I can see userspace creates a new VDUSE device with
> > ioctl(VDUSE_CREATE_DEV), which trigger the creation of a new
> > /dev/vduse/XXX device which will be labeled according to the udev
> > and SELinux configuration, likely with a generic udev label.  My
> > question is if we want to be able to uniquely label each VDUSE
> > device based on the process that initiates the device creation
> > with the call to ioctl()?  If that is the case, we would need a
> > create hook not only to control the creation of the device, but to
> > record the triggering process' label in the new device; this label
> > would then be used in subsequent VDUSE open and destroy operations.
> > The normal device file I/O operations would still be subject to the
> > standard SELinux file I/O permissions using the device file label
> > assigned by systemd/udev when the device was created.
> 
> I don't think we need a unique label for VDUSE devices, but maybe
> Michael thinks otherwise?

I don't know.
All this is consumed by libvirt, you need to ask these guys.


> > 
> > > +static int selinux_vduse_dev_destroy(u32 device_id)
> > > +{
> > > + u32 sid = current_sid();
> > > + int ret;
> > > +
> > > + ret = avc_has_perm(sid, sid, SECCLASS_VDUSE, VDUSE__DEVDESTROY, NULL);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + return vduse_check_device_type(sid, device_id);
> > > +}
> > > +
> > > +static int selinux_vduse_dev_open(u32 device_id)
> > > +{
> > > + u32 sid = current_sid();
> > > + int ret;
> > > +
> > > + ret = avc_has_perm(sid, sid, SECCLASS_VDUSE, VDUSE__DEVOPEN, NULL);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + return vduse_check_device_type(sid, device_id);
> > > +}
> > > +
> > >   /*
> > >* IMPORTANT NOTE: When adding new hooks, please be careful to keep 
> > > this order:
> > >* 1. any hooks that don't belong to (2.) or (3.) below,
> > > @@ -7243,6 +7295,9 @@ static struct security_hook_list selinux_hooks[] 
> > > __ro_after_init = {
> > >   #ifdef CONFIG_PERF_EVENTS
> > >   LSM_HOOK_INIT(perf_event_alloc, selinux_perf_event_alloc),
> > >

Re: [PATCH v4 4/4] vduse: Add LSM hooks to check Virtio device type

2023-12-08 Thread Maxime Coquelin


Hello Paul,

On 11/8/23 03:31, Paul Moore wrote:

On Oct 20, 2023 "Michael S. Tsirkin"  wrote:


This patch introduces LSM hooks for devices creation,
destruction and opening operations, checking the
application is allowed to perform these operations for
the Virtio device type.

Signed-off-by: Maxime Coquelin 
---
  drivers/vdpa/vdpa_user/vduse_dev.c  | 12 +++
  include/linux/lsm_hook_defs.h   |  4 +++
  include/linux/security.h| 15 
  security/security.c | 42 ++
  security/selinux/hooks.c| 55 +
  security/selinux/include/classmap.h |  2 ++
  6 files changed, 130 insertions(+)


My apologies for the late reply, I've been trying to work my way through
the review backlog but it has been taking longer than expected; comments
below ...


No worries, I have also been busy these days.


diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 2aa0e219d721..65d9262a37f7 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -21,6 +21,7 @@
   *  Copyright (C) 2016 Mellanox Technologies
   */
  
+#include "av_permissions.h"

  #include 
  #include 
  #include 
@@ -92,6 +93,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "avc.h"

  #include "objsec.h"
@@ -6950,6 +6952,56 @@ static int selinux_uring_cmd(struct io_uring_cmd *ioucmd)
  }
  #endif /* CONFIG_IO_URING */
  
+static int vduse_check_device_type(u32 sid, u32 device_id)

+{
+   u32 requested;
+
+   if (device_id == VIRTIO_ID_NET)
+   requested = VDUSE__NET;
+   else if (device_id == VIRTIO_ID_BLOCK)
+   requested = VDUSE__BLOCK;
+   else
+   return -EINVAL;
+
+   return avc_has_perm(sid, sid, SECCLASS_VDUSE, requested, NULL);
+}
+
+static int selinux_vduse_dev_create(u32 device_id)
+{
+   u32 sid = current_sid();
+   int ret;
+
+   ret = avc_has_perm(sid, sid, SECCLASS_VDUSE, VDUSE__DEVCREATE, NULL);
+   if (ret)
+   return ret;
+
+   return vduse_check_device_type(sid, device_id);
+}


I see there has been some discussion about the need for a dedicated
create hook as opposed to using the existing ioctl controls.  I think
one important point that has been missing from the discussion is the
idea of labeling the newly created device.  Unfortunately prior to a
few minutes ago I hadn't ever looked at VDUSE so please correct me if
I get some things wrong :)

 From what I can see userspace creates a new VDUSE device with
ioctl(VDUSE_CREATE_DEV), which trigger the creation of a new
/dev/vduse/XXX device which will be labeled according to the udev
and SELinux configuration, likely with a generic udev label.  My
question is if we want to be able to uniquely label each VDUSE
device based on the process that initiates the device creation
with the call to ioctl()?  If that is the case, we would need a
create hook not only to control the creation of the device, but to
record the triggering process' label in the new device; this label
would then be used in subsequent VDUSE open and destroy operations.
The normal device file I/O operations would still be subject to the
standard SELinux file I/O permissions using the device file label
assigned by systemd/udev when the device was created.


I don't think we need a unique label for VDUSE devices, but maybe
Michael thinks otherwise?




+static int selinux_vduse_dev_destroy(u32 device_id)
+{
+   u32 sid = current_sid();
+   int ret;
+
+   ret = avc_has_perm(sid, sid, SECCLASS_VDUSE, VDUSE__DEVDESTROY, NULL);
+   if (ret)
+   return ret;
+
+   return vduse_check_device_type(sid, device_id);
+}
+
+static int selinux_vduse_dev_open(u32 device_id)
+{
+   u32 sid = current_sid();
+   int ret;
+
+   ret = avc_has_perm(sid, sid, SECCLASS_VDUSE, VDUSE__DEVOPEN, NULL);
+   if (ret)
+   return ret;
+
+   return vduse_check_device_type(sid, device_id);
+}
+
  /*
   * IMPORTANT NOTE: When adding new hooks, please be careful to keep this 
order:
   * 1. any hooks that don't belong to (2.) or (3.) below,
@@ -7243,6 +7295,9 @@ static struct security_hook_list selinux_hooks[] 
__ro_after_init = {
  #ifdef CONFIG_PERF_EVENTS
LSM_HOOK_INIT(perf_event_alloc, selinux_perf_event_alloc),
  #endif
+   LSM_HOOK_INIT(vduse_dev_create, selinux_vduse_dev_create),
+   LSM_HOOK_INIT(vduse_dev_destroy, selinux_vduse_dev_destroy),
+   LSM_HOOK_INIT(vduse_dev_open, selinux_vduse_dev_open),
  };
  
  static __init int selinux_init(void)

diff --git a/security/selinux/include/classmap.h 
b/security/selinux/include/classmap.h
index a3c380775d41..d3dc37fb03d4 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -256,6 +256,8 @@ const struct security_class_mapping secclass_map[] = {
  { "override_creds", "sqpoll", "cmd", NULL } },
{ "user_namespace",
  { "create", NULL } },
+

Re: Re: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)

2023-12-08 Thread Michael S. Tsirkin

On Fri, Dec 08, 2023 at 10:24:16AM +0100, Tobias Huschle wrote:
> On Thu, Dec 07, 2023 at 01:48:40AM -0500, Michael S. Tsirkin wrote:
> > On Thu, Dec 07, 2023 at 07:22:12AM +0100, Tobias Huschle wrote:
> > > 3. vhost looping endlessly, waiting for kworker to be scheduled
> > > 
> > > I dug a little deeper on what the vhost is doing. I'm not an expert on
> > > virtio whatsoever, so these are just educated guesses that maybe
> > > someone can verify/correct. Please bear with me probably messing up 
> > > the terminology.
> > > 
> > > - vhost is looping through available queues.
> > > - vhost wants to wake up a kworker to process a found queue.
> > > - kworker does something with that queue and terminates quickly.
> > > 
> > > What I found by throwing in some very noisy trace statements was that,
> > > if the kworker is not woken up, the vhost just keeps looping accross
> > > all available queues (and seems to repeat itself). So it essentially
> > > relies on the scheduler to schedule the kworker fast enough. Otherwise
> > > it will just keep on looping until it is migrated off the CPU.
> > 
> > 
> > Normally it takes the buffers off the queue and is done with it.
> > I am guessing that at the same time guest is running on some other
> > CPU and keeps adding available buffers?
> > 
> 
> It seems to do just that, there are multiple other vhost instances
> involved which might keep filling up thoses queues. 
> 

No vhost is ever only draining queues. Guest is filling them.

> Unfortunately, this makes the problematic vhost instance to stay on
> the CPU and prevents said kworker to get scheduled. The kworker is
> explicitly woken up by vhost, so it wants it to do something.
> 
> At this point it seems that there is an assumption about the scheduler
> in place which is no longer fulfilled by EEVDF. From the discussion so
> far, it seems like EEVDF does what is intended to do.
> 
> Shouldn't there be a more explicit mechanism in use that allows the
> kworker to be scheduled in favor of the vhost?
> 
> It is also concerning that the vhost seems cannot be preempted by the
> scheduler while executing that loop.


Which loop is that, exactly?

-- 
MST

[PATCH v4 33/33] Documentation: probes: Update fprobe on function-graph tracer

2023-12-08 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Update fprobe documentation for the new fprobe on function-graph
tracer. This includes some bahvior changes and pt_regs to
ftrace_regs interface change.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v2:
  - Update @fregs parameter explanation.
---
 Documentation/trace/fprobe.rst |   42 ++--
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/Documentation/trace/fprobe.rst b/Documentation/trace/fprobe.rst
index 196f52386aaa..f58bdc64504f 100644
--- a/Documentation/trace/fprobe.rst
+++ b/Documentation/trace/fprobe.rst
@@ -9,9 +9,10 @@ Fprobe - Function entry/exit probe
 Introduction
 
 
-Fprobe is a function entry/exit probe mechanism based on ftrace.
-Instead of using ftrace full feature, if you only want to attach callbacks
-on function entry and exit, similar to the kprobes and kretprobes, you can
+Fprobe is a function entry/exit probe mechanism based on the function-graph
+tracer.
+Instead of tracing all functions, if you want to attach callbacks on specific
+function entry and exit, similar to the kprobes and kretprobes, you can
 use fprobe. Compared with kprobes and kretprobes, fprobe gives faster
 instrumentation for multiple functions with single handler. This document
 describes how to use fprobe.
@@ -91,12 +92,14 @@ The prototype of the entry/exit callback function are as 
follows:
 
 .. code-block:: c
 
- int entry_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long 
ret_ip, struct pt_regs *regs, void *entry_data);
+ int entry_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long 
ret_ip, struct ftrace_regs *fregs, void *entry_data);
 
- void exit_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long 
ret_ip, struct pt_regs *regs, void *entry_data);
+ void exit_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long 
ret_ip, struct ftrace_regs *fregs, void *entry_data);
 
-Note that the @entry_ip is saved at function entry and passed to exit handler.
-If the entry callback function returns !0, the corresponding exit callback 
will be cancelled.
+Note that the @entry_ip is saved at function entry and passed to exit
+handler.
+If the entry callback function returns !0, the corresponding exit callback
+will be cancelled.
 
 @fp
 This is the address of `fprobe` data structure related to this handler.
@@ -112,12 +115,10 @@ If the entry callback function returns !0, the 
corresponding exit callback will
 This is the return address that the traced function will return to,
 somewhere in the caller. This can be used at both entry and exit.
 
-@regs
-This is the `pt_regs` data structure at the entry and exit. Note that
-the instruction pointer of @regs may be different from the @entry_ip
-in the entry_handler. If you need traced instruction pointer, you need
-to use @entry_ip. On the other hand, in the exit_handler, the 
instruction
-pointer of @regs is set to the current return address.
+@fregs
+This is the `ftrace_regs` data structure at the entry and exit. This
+includes the function parameters, or the return values. So user can
+access thos values via appropriate `ftrace_regs_*` APIs.
 
 @entry_data
 This is a local storage to share the data between entry and exit 
handlers.
@@ -125,6 +126,17 @@ If the entry callback function returns !0, the 
corresponding exit callback will
 and `entry_data_size` field when registering the fprobe, the storage is
 allocated and passed to both `entry_handler` and `exit_handler`.
 
+Entry data size and exit handlers on the same function
+==
+
+Since the entry data is passed via per-task stack and it is has limited size,
+the entry data size per probe is limited to `15 * sizeof(long)`. You also need
+to take care that the different fprobes are probing on the same function, this
+limit becomes smaller. The entry data size is aligned to `sizeof(long)` and
+each fprobe which has exit handler uses a `sizeof(long)` space on the stack,
+you should keep the number of fprobes on the same function as small as
+possible.
+
 Share the callbacks with kprobes
 
 
@@ -165,8 +177,8 @@ This counter counts up when;
  - fprobe fails to take ftrace_recursion lock. This usually means that a 
function
which is traced by other ftrace users is called from the entry_handler.
 
- - fprobe fails to setup the function exit because of the shortage of rethook
-   (the shadow stack for hooking the function return.)
+ - fprobe fails to setup the function exit because of failing to allocate the
+   data buffer from the per-task shadow stack.
 
 The `fprobe::nmissed` field counts up in both cases. Therefore, the former
 skips both of entry and exit callback and the latter skips the exit

[PATCH v4 32/33] selftests: ftrace: Remove obsolate maxactive syntax check

2023-12-08 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Since the fprobe event does not support maxactive anymore, stop
testing the maxactive syntax error checking.

Signed-off-by: Masami Hiramatsu (Google) 
---
 .../ftrace/test.d/dynevent/fprobe_syntax_errors.tc |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git 
a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc 
b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
index 20e42c030095..66516073ff27 100644
--- a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
@@ -16,9 +16,7 @@ aarch64)
   REG=%r0 ;;
 esac
 
-check_error 'f^100 vfs_read'   # MAXACT_NO_KPROBE
-check_error 'f^1a111 vfs_read' # BAD_MAXACT
-check_error 'f^10 vfs_read'# MAXACT_TOO_BIG
+check_error 'f^100 vfs_read'   # BAD_MAXACT
 
 check_error 'f ^non_exist_func'# BAD_PROBE_ADDR (enoent)
 check_error 'f ^vfs_read+10'   # BAD_PROBE_ADDR

[PATCH v4 31/33] bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled

2023-12-08 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Enable kprobe_multi feature if CONFIG_FPROBE is enabled. The pt_regs is
converted from ftrace_regs by ftrace_partial_regs(), thus some registers
may always returns 0. But it should be enough for function entry (access
arguments) and exit (access return value).

Signed-off-by: Masami Hiramatsu (Google) 
Acked-by: Florent Revest 
---
 Changes from previous series: NOTHING, Update against the new series.
---
 kernel/trace/bpf_trace.c |   22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index efb792f8f2ea..24ee4e960f1d 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2503,7 +2503,7 @@ static int __init bpf_event_init(void)
 fs_initcall(bpf_event_init);
 #endif /* CONFIG_MODULES */
 
-#if defined(CONFIG_FPROBE) && defined(CONFIG_DYNAMIC_FTRACE_WITH_REGS)
+#ifdef CONFIG_FPROBE
 struct bpf_kprobe_multi_link {
struct bpf_link link;
struct fprobe fp;
@@ -2526,6 +2526,8 @@ struct user_syms {
char *buf;
 };
 
+static DEFINE_PER_CPU(struct pt_regs, bpf_kprobe_multi_pt_regs);
+
 static int copy_user_syms(struct user_syms *us, unsigned long __user *usyms, 
u32 cnt)
 {
unsigned long __user usymbol;
@@ -2703,13 +2705,14 @@ static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx 
*ctx)
 
 static int
 kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
-  unsigned long entry_ip, struct pt_regs *regs)
+  unsigned long entry_ip, struct ftrace_regs *fregs)
 {
struct bpf_kprobe_multi_run_ctx run_ctx = {
.link = link,
.entry_ip = entry_ip,
};
struct bpf_run_ctx *old_run_ctx;
+   struct pt_regs *regs;
int err;
 
if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
@@ -2720,6 +2723,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link 
*link,
 
migrate_disable();
rcu_read_lock();
+   regs = ftrace_partial_regs(fregs, 
this_cpu_ptr(_kprobe_multi_pt_regs));
old_run_ctx = bpf_set_run_ctx(_ctx.run_ctx);
err = bpf_prog_run(link->link.prog, regs);
bpf_reset_run_ctx(old_run_ctx);
@@ -2737,13 +2741,9 @@ kprobe_multi_link_handler(struct fprobe *fp, unsigned 
long fentry_ip,
  void *data)
 {
struct bpf_kprobe_multi_link *link;
-   struct pt_regs *regs = ftrace_get_regs(fregs);
-
-   if (!regs)
-   return 0;
 
link = container_of(fp, struct bpf_kprobe_multi_link, fp);
-   kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
+   kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), fregs);
return 0;
 }
 
@@ -2753,13 +2753,9 @@ kprobe_multi_link_exit_handler(struct fprobe *fp, 
unsigned long fentry_ip,
   void *data)
 {
struct bpf_kprobe_multi_link *link;
-   struct pt_regs *regs = ftrace_get_regs(fregs);
-
-   if (!regs)
-   return;
 
link = container_of(fp, struct bpf_kprobe_multi_link, fp);
-   kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
+   kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), fregs);
 }
 
 static int symbols_cmp_r(const void *a, const void *b, const void *priv)
@@ -3016,7 +3012,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr 
*attr, struct bpf_prog *pr
kvfree(cookies);
return err;
 }
-#else /* !CONFIG_FPROBE || !CONFIG_DYNAMIC_FTRACE_WITH_REGS */
+#else /* !CONFIG_FPROBE */
 int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog 
*prog)
 {
return -EOPNOTSUPP;

[PATCH v4 30/33] tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS

2023-12-08 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Allow fprobe events to be enabled with CONFIG_DYNAMIC_FTRACE_WITH_ARGS.
With this change, fprobe events mostly use ftrace_regs instead of pt_regs.
Note that if the arch doesn't enable HAVE_PT_REGS_COMPAT_FTRACE_REGS,
fprobe events will not be able to be used from perf.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Chagnes in v3:
  - Use ftrace_regs_get_return_value().
 Changes in v2:
  - Define ftrace_regs_get_kernel_stack_nth() for
!CONFIG_HAVE_REGS_AND_STACK_ACCESS_API.
 Changes from previous series: Update against the new series.
---
 include/linux/ftrace.h  |   17 +
 kernel/trace/Kconfig|1 -
 kernel/trace/trace_fprobe.c |   74 ---
 kernel/trace/trace_probe_tmpl.h |2 +
 4 files changed, 55 insertions(+), 39 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 8150edcf8496..ad28daa507f7 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -250,6 +250,23 @@ static __always_inline bool ftrace_regs_has_args(struct 
ftrace_regs *fregs)
regs_query_register_offset(name)
 #endif
 
+#ifdef CONFIG_HAVE_REGS_AND_STACK_ACCESS_API
+static __always_inline unsigned long
+ftrace_regs_get_kernel_stack_nth(struct ftrace_regs *fregs, unsigned int nth)
+{
+   unsigned long *stackp;
+
+   stackp = (unsigned long *)ftrace_regs_get_stack_pointer(fregs);
+   if (((unsigned long)(stackp + nth) & ~(THREAD_SIZE - 1)) ==
+   ((unsigned long)stackp & ~(THREAD_SIZE - 1)))
+   return *(stackp + nth);
+
+   return 0;
+}
+#else /* !CONFIG_HAVE_REGS_AND_STACK_ACCESS_API */
+#define ftrace_regs_get_kernel_stack_nth(fregs, nth)   (0L)
+#endif /* CONFIG_HAVE_REGS_AND_STACK_ACCESS_API */
+
 typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip,
  struct ftrace_ops *op, struct ftrace_regs *fregs);
 
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 11a96275b68c..169588021d90 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -681,7 +681,6 @@ config FPROBE_EVENTS
select TRACING
select PROBE_EVENTS
select DYNAMIC_EVENTS
-   depends on DYNAMIC_FTRACE_WITH_REGS
default y
help
  This allows user to add tracing events on the function entry and
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index 22cf51217b81..d96de0dbc0cb 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -132,7 +132,7 @@ static int
 process_fetch_insn(struct fetch_insn *code, void *rec, void *dest,
   void *base)
 {
-   struct pt_regs *regs = rec;
+   struct ftrace_regs *fregs = rec;
unsigned long val;
int ret;
 
@@ -140,17 +140,17 @@ process_fetch_insn(struct fetch_insn *code, void *rec, 
void *dest,
/* 1st stage: get value from context */
switch (code->op) {
case FETCH_OP_STACK:
-   val = regs_get_kernel_stack_nth(regs, code->param);
+   val = ftrace_regs_get_kernel_stack_nth(fregs, code->param);
break;
case FETCH_OP_STACKP:
-   val = kernel_stack_pointer(regs);
+   val = ftrace_regs_get_stack_pointer(fregs);
break;
case FETCH_OP_RETVAL:
-   val = regs_return_value(regs);
+   val = ftrace_regs_get_return_value(fregs);
break;
 #ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
case FETCH_OP_ARG:
-   val = regs_get_kernel_argument(regs, code->param);
+   val = ftrace_regs_get_argument(fregs, code->param);
break;
 #endif
case FETCH_NOP_SYMBOL:  /* Ignore a place holder */
@@ -170,7 +170,7 @@ NOKPROBE_SYMBOL(process_fetch_insn)
 /* function entry handler */
 static nokprobe_inline void
 __fentry_trace_func(struct trace_fprobe *tf, unsigned long entry_ip,
-   struct pt_regs *regs,
+   struct ftrace_regs *fregs,
struct trace_event_file *trace_file)
 {
struct fentry_trace_entry_head *entry;
@@ -184,36 +184,36 @@ __fentry_trace_func(struct trace_fprobe *tf, unsigned 
long entry_ip,
if (trace_trigger_soft_disabled(trace_file))
return;
 
-   dsize = __get_data_size(>tp, regs);
+   dsize = __get_data_size(>tp, fregs);
 
entry = trace_event_buffer_reserve(, trace_file,
   sizeof(*entry) + tf->tp.size + 
dsize);
if (!entry)
return;
 
-   fbuffer.regs = regs;
+   fbuffer.regs = ftrace_get_regs(fregs);
entry = fbuffer.entry = ring_buffer_event_data(fbuffer.event);
entry->ip = entry_ip;
-   store_trace_args([1], >tp, regs, sizeof(*entry), dsize);
+   store_trace_args([1], >tp, fregs, sizeof(*entry), dsize);
 
trace_event_buffer_commit();
 }
 
 static void

[PATCH v4 29/33] tracing/fprobe: Remove nr_maxactive from fprobe

2023-12-08 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Remove depercated fprobe::nr_maxactive. This involves fprobe events to
rejects the maxactive number.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v2:
  - Newly added.
---
 include/linux/fprobe.h  |2 --
 kernel/trace/trace_fprobe.c |   44 ++-
 2 files changed, 6 insertions(+), 40 deletions(-)

diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index 08b37b0d1d05..c28d06ddfb8e 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -47,7 +47,6 @@ struct fprobe_hlist {
  * @nmissed: The counter for missing events.
  * @flags: The status flag.
  * @entry_data_size: The private data storage size.
- * @nr_maxactive: The max number of active functions. (*deprecated)
  * @entry_handler: The callback function for function entry.
  * @exit_handler: The callback function for function exit.
  * @hlist_array: The fprobe_hlist for fprobe search from IP hash table.
@@ -56,7 +55,6 @@ struct fprobe {
unsigned long   nmissed;
unsigned intflags;
size_t  entry_data_size;
-   int nr_maxactive;
 
int (*entry_handler)(struct fprobe *fp, unsigned long entry_ip,
 unsigned long ret_ip, struct ftrace_regs *regs,
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index 3982626c82e6..22cf51217b81 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -375,7 +375,6 @@ static struct trace_fprobe *alloc_trace_fprobe(const char 
*group,
   const char *event,
   const char *symbol,
   struct tracepoint *tpoint,
-  int maxactive,
   int nargs, bool is_return)
 {
struct trace_fprobe *tf;
@@ -395,7 +394,6 @@ static struct trace_fprobe *alloc_trace_fprobe(const char 
*group,
tf->fp.entry_handler = fentry_dispatcher;
 
tf->tpoint = tpoint;
-   tf->fp.nr_maxactive = maxactive;
 
ret = trace_probe_init(>tp, event, group, false);
if (ret < 0)
@@ -974,12 +972,11 @@ static int __trace_fprobe_create(int argc, const char 
*argv[])
 *  FETCHARG:TYPE : use TYPE instead of unsigned long.
 */
struct trace_fprobe *tf = NULL;
-   int i, len, new_argc = 0, ret = 0;
+   int i, new_argc = 0, ret = 0;
bool is_return = false;
char *symbol = NULL;
const char *event = NULL, *group = FPROBE_EVENT_SYSTEM;
const char **new_argv = NULL;
-   int maxactive = 0;
char buf[MAX_EVENT_NAME_LEN];
char gbuf[MAX_EVENT_NAME_LEN];
char sbuf[KSYM_NAME_LEN];
@@ -1000,33 +997,13 @@ static int __trace_fprobe_create(int argc, const char 
*argv[])
 
trace_probe_log_init("trace_fprobe", argc, argv);
 
-   event = strchr([0][1], ':');
-   if (event)
-   event++;
-
-   if (isdigit(argv[0][1])) {
-   if (event)
-   len = event - [0][1] - 1;
-   else
-   len = strlen([0][1]);
-   if (len > MAX_EVENT_NAME_LEN - 1) {
-   trace_probe_log_err(1, BAD_MAXACT);
-   goto parse_error;
-   }
-   memcpy(buf, [0][1], len);
-   buf[len] = '\0';
-   ret = kstrtouint(buf, 0, );
-   if (ret || !maxactive) {
+   if (argv[0][1] != '\0') {
+   if (argv[0][1] != ':') {
+   trace_probe_log_set_index(0);
trace_probe_log_err(1, BAD_MAXACT);
goto parse_error;
}
-   /* fprobe rethook instances are iterated over via a list. The
-* maximum should stay reasonable.
-*/
-   if (maxactive > RETHOOK_MAXACTIVE_MAX) {
-   trace_probe_log_err(1, MAXACT_TOO_BIG);
-   goto parse_error;
-   }
+   event = [0][2];
}
 
trace_probe_log_set_index(1);
@@ -1036,12 +1013,6 @@ static int __trace_fprobe_create(int argc, const char 
*argv[])
if (ret < 0)
goto parse_error;
 
-   if (!is_return && maxactive) {
-   trace_probe_log_set_index(0);
-   trace_probe_log_err(1, BAD_MAXACT_TYPE);
-   goto parse_error;
-   }
-
trace_probe_log_set_index(0);
if (event) {
ret = traceprobe_parse_event_name(, , gbuf,
@@ -1095,8 +1066,7 @@ static int __trace_fprobe_create(int argc, const char 
*argv[])
}
 
/* setup a probe */
-   tf = alloc_trace_fprobe(group, event, symbol, tpoint, maxactive,
-   argc, is_return);
+   tf =

[PATCH v4 28/33] fprobe: Rewrite fprobe on function-graph tracer

2023-12-08 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Rewrite fprobe implementation on function-graph tracer.
Major API changes are:
 -  'nr_maxactive' field is deprecated.
 -  This depends on CONFIG_DYNAMIC_FTRACE_WITH_ARGS or
!CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS, and
CONFIG_HAVE_FUNCTION_GRAPH_FREGS. So currently works only
on x86_64.
 -  Currently the entry size is limited in 15 * sizeof(long).
 -  If there is too many fprobe exit handler set on the same
function, it will fail to probe.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v3:
  - Update for new reserve_data/retrieve_data API.
  - Fix internal push/pop on fgraph data logic so that it can
correctly save/restore the returning fprobes.
 Changes in v2:
  - Add more lockdep_assert_held(fprobe_mutex)
  - Use READ_ONCE() and WRITE_ONCE() for fprobe_hlist_node::fp.
  - Add NOKPROBE_SYMBOL() for the functions which is called from
entry/exit callback.
---
 include/linux/fprobe.h |   54 +++-
 kernel/trace/Kconfig   |8 -
 kernel/trace/fprobe.c  |  632 ++--
 lib/test_fprobe.c  |   45 ---
 4 files changed, 494 insertions(+), 245 deletions(-)

diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index 879a30956009..08b37b0d1d05 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -5,32 +5,56 @@
 
 #include 
 #include 
-#include 
+#include 
+#include 
+#include 
+
+struct fprobe;
+
+/**
+ * strcut fprobe_hlist_node - address based hash list node for fprobe.
+ *
+ * @hlist: The hlist node for address search hash table.
+ * @addr: The address represented by this.
+ * @fp: The fprobe which owns this.
+ */
+struct fprobe_hlist_node {
+   struct hlist_node   hlist;
+   unsigned long   addr;
+   struct fprobe   *fp;
+};
+
+/**
+ * struct fprobe_hlist - hash list nodes for fprobe.
+ *
+ * @hlist: The hlist node for existence checking hash table.
+ * @rcu: rcu_head for RCU deferred release.
+ * @fp: The fprobe which owns this fprobe_hlist.
+ * @size: The size of @array.
+ * @array: The fprobe_hlist_node for each address to probe.
+ */
+struct fprobe_hlist {
+   struct hlist_node   hlist;
+   struct rcu_head rcu;
+   struct fprobe   *fp;
+   int size;
+   struct fprobe_hlist_nodearray[];
+};
 
 /**
  * struct fprobe - ftrace based probe.
- * @ops: The ftrace_ops.
+ *
  * @nmissed: The counter for missing events.
  * @flags: The status flag.
- * @rethook: The rethook data structure. (internal data)
  * @entry_data_size: The private data storage size.
- * @nr_maxactive: The max number of active functions.
+ * @nr_maxactive: The max number of active functions. (*deprecated)
  * @entry_handler: The callback function for function entry.
  * @exit_handler: The callback function for function exit.
+ * @hlist_array: The fprobe_hlist for fprobe search from IP hash table.
  */
 struct fprobe {
-#ifdef CONFIG_FUNCTION_TRACER
-   /*
-* If CONFIG_FUNCTION_TRACER is not set, CONFIG_FPROBE is disabled too.
-* But user of fprobe may keep embedding the struct fprobe on their own
-* code. To avoid build error, this will keep the fprobe data structure
-* defined here, but remove ftrace_ops data structure.
-*/
-   struct ftrace_ops   ops;
-#endif
unsigned long   nmissed;
unsigned intflags;
-   struct rethook  *rethook;
size_t  entry_data_size;
int nr_maxactive;
 
@@ -40,6 +64,8 @@ struct fprobe {
void (*exit_handler)(struct fprobe *fp, unsigned long entry_ip,
 unsigned long ret_ip, struct ftrace_regs *fregs,
 void *entry_data);
+
+   struct fprobe_hlist *hlist_array;
 };
 
 /* This fprobe is soft-disabled. */
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 1a2544712690..11a96275b68c 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -296,11 +296,9 @@ config DYNAMIC_FTRACE_WITH_ARGS
 
 config FPROBE
bool "Kernel Function Probe (fprobe)"
-   depends on FUNCTION_TRACER
-   depends on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS
-   depends on HAVE_PT_REGS_TO_FTRACE_REGS_CAST || 
!HAVE_DYNAMIC_FTRACE_WITH_ARGS
-   depends on HAVE_RETHOOK
-   select RETHOOK
+   depends on FUNCTION_GRAPH_TRACER
+   depends on HAVE_FUNCTION_GRAPH_FREGS
+   depends on DYNAMIC_FTRACE_WITH_ARGS || !HAVE_DYNAMIC_FTRACE_WITH_ARGS
default n
help
  This option enables kernel function probe (fprobe) based on ftrace.
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 688b897626b4..53e681c2458b 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -8,98 +8,193 @@
 #include 
 #include 
 #include 
-#include 
+#include 
+#include 
 #include 
 #include

[PATCH v4 27/33] tracing: Add ftrace_fill_perf_regs() for perf event

2023-12-08 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Add ftrace_fill_perf_regs() which should be compatible with the
perf_fetch_caller_regs(). In other words, the pt_regs returned from the
ftrace_fill_perf_regs() must satisfy 'user_mode(regs) == false' and can be
used for stack tracing.

Signed-off-by: Masami Hiramatsu (Google) 
---
  Changes from previous series: NOTHING, just forward ported.
---
 arch/arm64/include/asm/ftrace.h   |7 +++
 arch/powerpc/include/asm/ftrace.h |7 +++
 arch/s390/include/asm/ftrace.h|5 +
 arch/x86/include/asm/ftrace.h |7 +++
 include/linux/ftrace.h|   31 +++
 5 files changed, 57 insertions(+)

diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index 31051fa2b4d9..c1921bdf760b 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -154,6 +154,13 @@ ftrace_partial_regs(const struct ftrace_regs *fregs, 
struct pt_regs *regs)
return regs;
 }
 
+#define arch_ftrace_fill_perf_regs(fregs, _regs) do {  \
+   (_regs)->pc = (fregs)->pc;  \
+   (_regs)->regs[29] = (fregs)->fp;\
+   (_regs)->sp = (fregs)->sp;  \
+   (_regs)->pstate = PSR_MODE_EL1h;\
+   } while (0)
+
 int ftrace_regs_query_register_offset(const char *name);
 
 int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 7e138e0e3baf..8737a794c764 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -52,6 +52,13 @@ static __always_inline struct pt_regs 
*arch_ftrace_get_regs(struct ftrace_regs *
return fregs->regs.msr ? >regs : NULL;
 }
 
+#define arch_ftrace_fill_perf_regs(fregs, _regs) do {  \
+   (_regs)->result = 0;\
+   (_regs)->nip = (fregs)->regs.nip;   \
+   (_regs)->gpr[1] = (fregs)->regs.gpr[1]; \
+   asm volatile("mfmsr %0" : "=r" ((_regs)->msr)); \
+   } while (0)
+
 static __always_inline void
 ftrace_regs_set_instruction_pointer(struct ftrace_regs *fregs,
unsigned long ip)
diff --git a/arch/s390/include/asm/ftrace.h b/arch/s390/include/asm/ftrace.h
index 01e775c98425..c2a269c1617c 100644
--- a/arch/s390/include/asm/ftrace.h
+++ b/arch/s390/include/asm/ftrace.h
@@ -97,6 +97,11 @@ ftrace_regs_set_instruction_pointer(struct ftrace_regs 
*fregs,
 #define ftrace_regs_query_register_offset(name) \
regs_query_register_offset(name)
 
+#define arch_ftrace_fill_perf_regs(fregs, _regs)do {   \
+   (_regs)->psw.addr = (fregs)->regs.psw.addr; \
+   (_regs)->gprs[15] = (fregs)->regs.gprs[15]; \
+   } while (0)
+
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
 /*
  * When an ftrace registered caller is tracing a function that is
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index a061f8832b20..2e3de45e9746 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -54,6 +54,13 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs)
return >regs;
 }
 
+#define arch_ftrace_fill_perf_regs(fregs, _regs) do {  \
+   (_regs)->ip = (fregs)->regs.ip; \
+   (_regs)->sp = (fregs)->regs.sp; \
+   (_regs)->cs = __KERNEL_CS;  \
+   (_regs)->flags = 0; \
+   } while (0)
+
 #define ftrace_regs_set_instruction_pointer(fregs, _ip)\
do { (fregs)->regs.ip = (_ip); } while (0)
 
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 515ec804d605..8150edcf8496 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -190,6 +190,37 @@ ftrace_partial_regs(struct ftrace_regs *fregs, struct 
pt_regs *regs)
 
 #endif /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS || 
CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */
 
+#ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
+
+/*
+ * Please define arch dependent pt_regs which compatible to the
+ * perf_arch_fetch_caller_regs() but based on ftrace_regs.
+ * This requires
+ *   - user_mode(_regs) returns false (always kernel mode).
+ *   - able to use the _regs for stack trace.
+ */
+#ifndef arch_ftrace_fill_perf_regs
+/* As same as perf_arch_fetch_caller_regs(), do nothing by default */
+#define arch_ftrace_fill_perf_regs(fregs, _regs) do {} while (0)
+#endif
+
+static __always_inline struct pt_regs *
+ftrace_fill_perf_regs(struct ftrace_regs *fregs, struct pt_regs *regs)
+{
+   arch_ftrace_fill_perf_regs(fregs, regs);
+   return regs;
+}
+
+#else /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS */
+
+static __always_inline struct pt_regs *
+ftrace_fill_perf_regs(struct ftrace_regs *fregs, struct pt_regs *regs)
+{
+

[PATCH v4 26/33] tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs

2023-12-08 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Add ftrace_partial_regs() which converts the ftrace_regs to pt_regs.
If the architecture defines its own ftrace_regs, this copies partial
registers to pt_regs and returns it. If not, ftrace_regs is the same as
pt_regs and ftrace_partial_regs() will return ftrace_regs::regs.

Signed-off-by: Masami Hiramatsu (Google) 
Acked-by: Florent Revest 
---
 Changes from previous series: NOTHING, just forward ported.
---
 arch/arm64/include/asm/ftrace.h |   11 +++
 include/linux/ftrace.h  |   17 +
 2 files changed, 28 insertions(+)

diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index efd5dbf74dd6..31051fa2b4d9 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -143,6 +143,17 @@ ftrace_override_function_with_return(struct ftrace_regs 
*fregs)
fregs->pc = fregs->lr;
 }
 
+static __always_inline struct pt_regs *
+ftrace_partial_regs(const struct ftrace_regs *fregs, struct pt_regs *regs)
+{
+   memcpy(regs->regs, fregs->regs, sizeof(u64) * 9);
+   regs->sp = fregs->sp;
+   regs->pc = fregs->pc;
+   regs->regs[29] = fregs->fp;
+   regs->regs[30] = fregs->lr;
+   return regs;
+}
+
 int ftrace_regs_query_register_offset(const char *name);
 
 int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index a72a2eaec576..515ec804d605 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -173,6 +173,23 @@ static __always_inline struct pt_regs 
*ftrace_get_regs(struct ftrace_regs *fregs
return arch_ftrace_get_regs(fregs);
 }
 
+#if !defined(CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS) || \
+   defined(CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST)
+
+static __always_inline struct pt_regs *
+ftrace_partial_regs(struct ftrace_regs *fregs, struct pt_regs *regs)
+{
+   /*
+* If CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST=y, ftrace_regs memory
+* layout is the same as pt_regs. So always returns that address.
+* Since arch_ftrace_get_regs() will check some members and may return
+* NULL, we can not use it.
+*/
+   return >regs;
+}
+
+#endif /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS || 
CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */
+
 /*
  * When true, the ftrace_regs_{get,set}_*() functions may be used on fregs.
  * Note: this can be true even when ftrace_get_regs() cannot provide a pt_regs.

[PATCH v4 25/33] fprobe: Use ftrace_regs in fprobe exit handler

2023-12-08 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Change the fprobe exit handler to use ftrace_regs structure instead of
pt_regs. This also introduce HAVE_PT_REGS_TO_FTRACE_REGS_CAST which means
the ftrace_regs's memory layout is equal to the pt_regs so that those are
able to cast. Fprobe introduces a new dependency with that.

Signed-off-by: Masami Hiramatsu (Google) 
---
  Changes in v3:
   - Use ftrace_regs_get_return_value()
  Changes from previous series: NOTHING, just forward ported.
---
 arch/loongarch/Kconfig  |1 +
 arch/s390/Kconfig   |1 +
 arch/x86/Kconfig|1 +
 include/linux/fprobe.h  |2 +-
 include/linux/ftrace.h  |5 +
 kernel/trace/Kconfig|8 
 kernel/trace/bpf_trace.c|6 +-
 kernel/trace/fprobe.c   |3 ++-
 kernel/trace/trace_fprobe.c |6 +-
 lib/test_fprobe.c   |6 +++---
 samples/fprobe/fprobe_example.c |2 +-
 11 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index ee123820a476..b0bd252aefe8 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -108,6 +108,7 @@ config LOONGARCH
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_ARGS
+   select HAVE_PT_REGS_TO_FTRACE_REGS_CAST
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_EBPF_JIT
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 3bec98d20283..122e9d6e3ad3 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -168,6 +168,7 @@ config S390
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_ARGS
+   select HAVE_PT_REGS_TO_FTRACE_REGS_CAST
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_EBPF_JIT if HAVE_MARCH_Z196_FEATURES
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3b955c9e4eb6..1d1da801da7f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -209,6 +209,7 @@ config X86
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_DYNAMIC_FTRACE_WITH_ARGSif X86_64
+   select HAVE_PT_REGS_TO_FTRACE_REGS_CAST if X86_64
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_SAMPLE_FTRACE_DIRECTif X86_64
select HAVE_SAMPLE_FTRACE_DIRECT_MULTI  if X86_64
diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index 36c0595f7b93..879a30956009 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -38,7 +38,7 @@ struct fprobe {
 unsigned long ret_ip, struct ftrace_regs *regs,
 void *entry_data);
void (*exit_handler)(struct fprobe *fp, unsigned long entry_ip,
-unsigned long ret_ip, struct pt_regs *regs,
+unsigned long ret_ip, struct ftrace_regs *fregs,
 void *entry_data);
 };
 
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index da2a23f5a9ed..a72a2eaec576 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -159,6 +159,11 @@ struct ftrace_regs {
 #define ftrace_regs_set_instruction_pointer(fregs, ip) do { } while (0)
 #endif /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS */
 
+#ifdef CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST
+
+static_assert(sizeof(struct pt_regs) == sizeof(struct ftrace_regs));
+
+#endif /* CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */
 
 static __always_inline struct pt_regs *ftrace_get_regs(struct ftrace_regs 
*fregs)
 {
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 805d72ab77c6..1a2544712690 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -60,6 +60,13 @@ config HAVE_DYNAMIC_FTRACE_WITH_ARGS
 This allows for use of ftrace_regs_get_argument() and
 ftrace_regs_get_stack_pointer().
 
+config HAVE_PT_REGS_TO_FTRACE_REGS_CAST
+   bool
+   help
+If this is set, the memory layout of the ftrace_regs data structure
+is the same as the pt_regs. So the pt_regs is possible to be casted
+to ftrace_regs.
+
 config HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
bool
help
@@ -291,6 +298,7 @@ config FPROBE
bool "Kernel Function Probe (fprobe)"
depends on FUNCTION_TRACER
depends on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS
+   depends on HAVE_PT_REGS_TO_FTRACE_REGS_CAST || 
!HAVE_DYNAMIC_FTRACE_WITH_ARGS
depends on HAVE_RETHOOK
select RETHOOK
default n
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index d3f8745d8ead..efb792f8f2ea 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2749,10 +2749,14 @@ kprobe_multi_link_handler(struct fprobe *fp, unsigned 
long fentry_ip,
 
 static void

[PATCH v4 23/33] arm64: ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS

2023-12-08 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Enable CONFIG_HAVE_FUNCTION_GRAPH_FREGS on arm64. Note that this
depends on HAVE_DYNAMIC_FTRACE_WITH_ARGS which is enabled if the
compiler supports "-fpatchable-function-entry=2". If not, it
continue to use ftrace_ret_regs.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v3:
   - Newly added.
---
 arch/arm64/Kconfig   |2 ++
 arch/arm64/include/asm/ftrace.h  |6 ++
 arch/arm64/kernel/entry-ftrace.S |   28 
 3 files changed, 36 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b071a00425d..beebc724dcae 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -192,6 +192,8 @@ config ARM64
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_ARGS \
if $(cc-option,-fpatchable-function-entry=2)
+   select HAVE_FUNCTION_GRAPH_FREGS \
+   if HAVE_DYNAMIC_FTRACE_WITH_ARGS
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS \
if DYNAMIC_FTRACE_WITH_ARGS && DYNAMIC_FTRACE_WITH_CALL_OPS
select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS \
diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index ab158196480c..efd5dbf74dd6 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -131,6 +131,12 @@ ftrace_regs_set_return_value(struct ftrace_regs *fregs,
fregs->regs[0] = ret;
 }
 
+static __always_inline unsigned long
+ftrace_regs_get_frame_pointer(struct ftrace_regs *fregs)
+{
+   return fregs->fp;
+}
+
 static __always_inline void
 ftrace_override_function_with_return(struct ftrace_regs *fregs)
 {
diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index f0c16640ef21..d87ccdb9e678 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -328,6 +328,33 @@ SYM_FUNC_END(ftrace_stub_graph)
  * Run ftrace_return_to_handler() before going back to parent.
  * @fp is checked against the value passed by ftrace_graph_caller().
  */
+#ifdef CONFIG_HAVE_FUNCTION_GRAPH_FREGS
+SYM_CODE_START(return_to_handler)
+   /* save ftrace_regs except for PC */
+   sub sp, sp, #FREGS_SIZE
+   stp x0, x1, [sp, #FREGS_X0]
+   stp x2, x3, [sp, #FREGS_X2]
+   stp x4, x5, [sp, #FREGS_X4]
+   stp x6, x7, [sp, #FREGS_X6]
+   str x8, [sp, #FREGS_X8]
+   str x29, [sp, #FREGS_FP]
+   str x9,  [sp, #FREGS_LR]
+   str x10, [sp, #FREGS_SP]
+
+   mov x0, sp
+   bl  ftrace_return_to_handler// addr = 
ftrace_return_to_hander(fregs);
+   mov x30, x0 // restore the original return 
address
+
+   /* restore return value regs */
+   ldp x0, x1, [sp, #FREGS_X0]
+   ldp x2, x3, [sp, #FREGS_X2]
+   ldp x4, x5, [sp, #FREGS_X4]
+   ldp x6, x7, [sp, #FREGS_X6]
+   add sp, sp, #FREGS_SIZE
+
+   ret
+SYM_CODE_END(return_to_handler)
+#else /* !CONFIG_HAVE_FUNCTION_GRAPH_FREGS */
 SYM_CODE_START(return_to_handler)
/* save return value regs */
sub sp, sp, #FGRET_REGS_SIZE
@@ -350,4 +377,5 @@ SYM_CODE_START(return_to_handler)
 
ret
 SYM_CODE_END(return_to_handler)
+#endif /* CONFIG_HAVE_FUNCTION_GRAPH_FREGS */
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */

[PATCH v4 24/33] fprobe: Use ftrace_regs in fprobe entry handler

2023-12-08 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

This allows fprobes to be available with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
instead of CONFIG_DYNAMIC_FTRACE_WITH_REGS, then we can enable fprobe
on arm64.

Signed-off-by: Masami Hiramatsu (Google) 
Acked-by: Florent Revest 
---
 Changes from previous series: NOTHING, just forward ported.
---
 include/linux/fprobe.h  |2 +-
 kernel/trace/Kconfig|3 ++-
 kernel/trace/bpf_trace.c|   10 +++---
 kernel/trace/fprobe.c   |4 ++--
 kernel/trace/trace_fprobe.c |6 +-
 lib/test_fprobe.c   |4 ++--
 samples/fprobe/fprobe_example.c |2 +-
 7 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index 3e03758151f4..36c0595f7b93 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -35,7 +35,7 @@ struct fprobe {
int nr_maxactive;
 
int (*entry_handler)(struct fprobe *fp, unsigned long entry_ip,
-unsigned long ret_ip, struct pt_regs *regs,
+unsigned long ret_ip, struct ftrace_regs *regs,
 void *entry_data);
void (*exit_handler)(struct fprobe *fp, unsigned long entry_ip,
 unsigned long ret_ip, struct pt_regs *regs,
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 308b3bec01b1..805d72ab77c6 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -290,7 +290,7 @@ config DYNAMIC_FTRACE_WITH_ARGS
 config FPROBE
bool "Kernel Function Probe (fprobe)"
depends on FUNCTION_TRACER
-   depends on DYNAMIC_FTRACE_WITH_REGS
+   depends on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS
depends on HAVE_RETHOOK
select RETHOOK
default n
@@ -675,6 +675,7 @@ config FPROBE_EVENTS
select TRACING
select PROBE_EVENTS
select DYNAMIC_EVENTS
+   depends on DYNAMIC_FTRACE_WITH_REGS
default y
help
  This allows user to add tracing events on the function entry and
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 84e8a0f6e4e0..d3f8745d8ead 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2503,7 +2503,7 @@ static int __init bpf_event_init(void)
 fs_initcall(bpf_event_init);
 #endif /* CONFIG_MODULES */
 
-#ifdef CONFIG_FPROBE
+#if defined(CONFIG_FPROBE) && defined(CONFIG_DYNAMIC_FTRACE_WITH_REGS)
 struct bpf_kprobe_multi_link {
struct bpf_link link;
struct fprobe fp;
@@ -2733,10 +2733,14 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link 
*link,
 
 static int
 kprobe_multi_link_handler(struct fprobe *fp, unsigned long fentry_ip,
- unsigned long ret_ip, struct pt_regs *regs,
+ unsigned long ret_ip, struct ftrace_regs *fregs,
  void *data)
 {
struct bpf_kprobe_multi_link *link;
+   struct pt_regs *regs = ftrace_get_regs(fregs);
+
+   if (!regs)
+   return 0;
 
link = container_of(fp, struct bpf_kprobe_multi_link, fp);
kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
@@ -3008,7 +3012,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr 
*attr, struct bpf_prog *pr
kvfree(cookies);
return err;
 }
-#else /* !CONFIG_FPROBE */
+#else /* !CONFIG_FPROBE || !CONFIG_DYNAMIC_FTRACE_WITH_REGS */
 int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog 
*prog)
 {
return -EOPNOTSUPP;
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 6cd2a4e3afb8..f12569494d8a 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -46,7 +46,7 @@ static inline void __fprobe_handler(unsigned long ip, 
unsigned long parent_ip,
}
 
if (fp->entry_handler)
-   ret = fp->entry_handler(fp, ip, parent_ip, 
ftrace_get_regs(fregs), entry_data);
+   ret = fp->entry_handler(fp, ip, parent_ip, fregs, entry_data);
 
/* If entry_handler returns !0, nmissed is not counted. */
if (rh) {
@@ -182,7 +182,7 @@ static void fprobe_init(struct fprobe *fp)
fp->ops.func = fprobe_kprobe_handler;
else
fp->ops.func = fprobe_handler;
-   fp->ops.flags |= FTRACE_OPS_FL_SAVE_REGS;
+   fp->ops.flags |= FTRACE_OPS_FL_SAVE_ARGS;
 }
 
 static int fprobe_init_rethook(struct fprobe *fp, int num)
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index 7d2ddbcfa377..ef6b36fd05ae 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -320,12 +320,16 @@ NOKPROBE_SYMBOL(fexit_perf_func);
 #endif /* CONFIG_PERF_EVENTS */
 
 static int fentry_dispatcher(struct fprobe *fp, unsigned long entry_ip,
-unsigned long ret_ip, struct pt_regs *regs,
+unsigned long ret_ip, struct ftrace_regs *fregs,

[PATCH v4 22/33] tracing: Rename ftrace_regs_return_value to ftrace_regs_get_return_value

2023-12-08 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Rename ftrace_regs_return_value to ftrace_regs_get_return_value as same as
other ftrace_regs_get/set_* APIs.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v3:
  - Newly added.
---
 arch/loongarch/include/asm/ftrace.h |2 +-
 arch/powerpc/include/asm/ftrace.h   |2 +-
 arch/s390/include/asm/ftrace.h  |2 +-
 arch/x86/include/asm/ftrace.h   |2 +-
 include/linux/ftrace.h  |2 +-
 kernel/trace/fgraph.c   |2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/loongarch/include/asm/ftrace.h 
b/arch/loongarch/include/asm/ftrace.h
index a11996eb5892..a9c3d0f2f941 100644
--- a/arch/loongarch/include/asm/ftrace.h
+++ b/arch/loongarch/include/asm/ftrace.h
@@ -70,7 +70,7 @@ ftrace_regs_set_instruction_pointer(struct ftrace_regs 
*fregs, unsigned long ip)
regs_get_kernel_argument(&(fregs)->regs, n)
 #define ftrace_regs_get_stack_pointer(fregs) \
kernel_stack_pointer(&(fregs)->regs)
-#define ftrace_regs_return_value(fregs) \
+#define ftrace_regs_get_return_value(fregs) \
regs_return_value(&(fregs)->regs)
 #define ftrace_regs_set_return_value(fregs, ret) \
regs_set_return_value(&(fregs)->regs, ret)
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 9e5a39b6a311..7e138e0e3baf 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -69,7 +69,7 @@ ftrace_regs_get_instruction_pointer(struct ftrace_regs *fregs)
regs_get_kernel_argument(&(fregs)->regs, n)
 #define ftrace_regs_get_stack_pointer(fregs) \
kernel_stack_pointer(&(fregs)->regs)
-#define ftrace_regs_return_value(fregs) \
+#define ftrace_regs_get_return_value(fregs) \
regs_return_value(&(fregs)->regs)
 #define ftrace_regs_set_return_value(fregs, ret) \
regs_set_return_value(&(fregs)->regs, ret)
diff --git a/arch/s390/include/asm/ftrace.h b/arch/s390/include/asm/ftrace.h
index 5a82b08f03cd..01e775c98425 100644
--- a/arch/s390/include/asm/ftrace.h
+++ b/arch/s390/include/asm/ftrace.h
@@ -88,7 +88,7 @@ ftrace_regs_set_instruction_pointer(struct ftrace_regs *fregs,
regs_get_kernel_argument(&(fregs)->regs, n)
 #define ftrace_regs_get_stack_pointer(fregs) \
kernel_stack_pointer(&(fregs)->regs)
-#define ftrace_regs_return_value(fregs) \
+#define ftrace_regs_get_return_value(fregs) \
regs_return_value(&(fregs)->regs)
 #define ftrace_regs_set_return_value(fregs, ret) \
regs_set_return_value(&(fregs)->regs, ret)
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 0b306c82855d..a061f8832b20 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -64,7 +64,7 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs)
regs_get_kernel_argument(&(fregs)->regs, n)
 #define ftrace_regs_get_stack_pointer(fregs) \
kernel_stack_pointer(&(fregs)->regs)
-#define ftrace_regs_return_value(fregs) \
+#define ftrace_regs_get_return_value(fregs) \
regs_return_value(&(fregs)->regs)
 #define ftrace_regs_set_return_value(fregs, ret) \
regs_set_return_value(&(fregs)->regs, ret)
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 79875a00c02b..da2a23f5a9ed 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -187,7 +187,7 @@ static __always_inline bool ftrace_regs_has_args(struct 
ftrace_regs *fregs)
regs_get_kernel_argument(ftrace_get_regs(fregs), n)
 #define ftrace_regs_get_stack_pointer(fregs) \
kernel_stack_pointer(ftrace_get_regs(fregs))
-#define ftrace_regs_return_value(fregs) \
+#define ftrace_regs_get_return_value(fregs) \
regs_return_value(ftrace_get_regs(fregs))
 #define ftrace_regs_set_return_value(fregs, ret) \
regs_set_return_value(ftrace_get_regs(fregs), ret)
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 0ac242d22724..ff830e7f8966 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -783,7 +783,7 @@ static void fgraph_call_retfunc(struct ftrace_regs *fregs,
trace.rettime = trace_clock_local();
 #ifdef CONFIG_FUNCTION_GRAPH_RETVAL
if (fregs)
-   trace.retval = ftrace_regs_return_value(fregs);
+   trace.retval = ftrace_regs_get_return_value(fregs);
else
trace.retval = fgraph_ret_regs_return_value(ret_regs);
 #endif

[PATCH v4 21/33] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS

2023-12-08 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Support HAVE_FUNCTION_GRAPH_FREGS on x86-64, which saves ftrace_regs
on the stack in ftrace_graph return trampoline so that the callbacks
can access registers via ftrace_regs APIs.

Note that this only recovers 'rax' and 'rdx' registers because other
registers are not used anymore and recovered by caller. 'rax' and
'rdx' will be used for passing the return value.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v3:
  - Add a comment about rip.
 Changes in v2:
  - Save rsp register and drop clearing orig_ax.
---
 arch/x86/Kconfig|3 ++-
 arch/x86/kernel/ftrace_64.S |   37 +
 2 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3762f41bb092..3b955c9e4eb6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -219,7 +219,8 @@ config X86
select HAVE_FAST_GUP
select HAVE_FENTRY  if X86_64 || DYNAMIC_FTRACE
select HAVE_FTRACE_MCOUNT_RECORD
-   select HAVE_FUNCTION_GRAPH_RETVAL   if HAVE_FUNCTION_GRAPH_TRACER
+   select HAVE_FUNCTION_GRAPH_FREGSif HAVE_DYNAMIC_FTRACE_WITH_ARGS
+   select HAVE_FUNCTION_GRAPH_RETVAL   if 
!HAVE_DYNAMIC_FTRACE_WITH_ARGS
select HAVE_FUNCTION_GRAPH_TRACER   if X86_32 || (X86_64 && 
DYNAMIC_FTRACE)
select HAVE_FUNCTION_TRACER
select HAVE_GCC_PLUGINS
diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index 214f30e9f0c0..8a16f774604e 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -348,21 +348,42 @@ STACK_FRAME_NON_STANDARD_FP(__fentry__)
 SYM_CODE_START(return_to_handler)
UNWIND_HINT_UNDEFINED
ANNOTATE_NOENDBR
-   subq  $24, %rsp
+   /*
+* Save the registers requires for ftrace_regs;
+* rax, rcx, rdx, rdi, rsi, r8, r9 and rbp
+*/
+   subq $(FRAME_SIZE), %rsp
+   movq %rax, RAX(%rsp)
+   movq %rcx, RCX(%rsp)
+   movq %rdx, RDX(%rsp)
+   movq %rsi, RSI(%rsp)
+   movq %rdi, RDI(%rsp)
+   movq %r8, R8(%rsp)
+   movq %r9, R9(%rsp)
+   movq %rbp, RBP(%rsp)
+   /*
+* orig_ax is not cleared because it is used for indicating the direct
+* trampoline in the fentry. And rip is not set because we don't know
+* the correct return address here.
+*/
+
+   leaq FRAME_SIZE(%rsp), %rcx
+   movq %rcx, RSP(%rsp)
 
-   /* Save the return values */
-   movq %rax, (%rsp)
-   movq %rdx, 8(%rsp)
-   movq %rbp, 16(%rsp)
movq %rsp, %rdi
 
call ftrace_return_to_handler
 
movq %rax, %rdi
-   movq 8(%rsp), %rdx
-   movq (%rsp), %rax
 
-   addq $24, %rsp
+   /*
+* Restore only rax and rdx because other registers are not used
+* for return value nor callee saved. Caller will reuse/recover it.
+*/
+   movq RDX(%rsp), %rdx
+   movq RAX(%rsp), %rax
+
+   addq $(FRAME_SIZE), %rsp
/*
 * Jump back to the old return address. This cannot be JMP_NOSPEC rdi
 * since IBT would demand that contain ENDBR, which simply isn't so for

[PATCH v4 20/33] function_graph: Add a new exit handler with parent_ip and ftrace_regs

2023-12-08 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Add a new return handler to fgraph_ops as 'retregfunc'  which takes
parent_ip and ftrace_regs instead of ftrace_graph_ret. This handler
is available only if the arch support CONFIG_HAVE_FUNCTION_GRAPH_FREGS.
Note that the 'retfunc' and 'reregfunc' are mutual exclusive.
You can set only one of them.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v3:
  - Update for new multiple fgraph.
  - Save the return address to instruction pointer in ftrace_regs.
---
 arch/x86/include/asm/ftrace.h |2 +
 include/linux/ftrace.h|   10 +-
 kernel/trace/Kconfig  |5 ++-
 kernel/trace/fgraph.c |   70 -
 4 files changed, 63 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 415cf7a2ec2c..0b306c82855d 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -72,6 +72,8 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs)
override_function_with_return(&(fregs)->regs)
 #define ftrace_regs_query_register_offset(name) \
regs_query_register_offset(name)
+#define ftrace_regs_get_frame_pointer(fregs) \
+   frame_pointer(&(fregs)->regs)
 
 struct ftrace_ops;
 #define ftrace_graph_func ftrace_graph_func
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 6da6cc9aaeaf..79875a00c02b 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -43,7 +43,9 @@ struct dyn_ftrace;
 
 char *arch_ftrace_match_adjust(char *str, const char *search);
 
-#ifdef CONFIG_HAVE_FUNCTION_GRAPH_RETVAL
+#ifdef CONFIG_HAVE_FUNCTION_GRAPH_FREGS
+unsigned long ftrace_return_to_handler(struct ftrace_regs *fregs);
+#elif defined(CONFIG_HAVE_FUNCTION_GRAPH_RETVAL)
 struct fgraph_ret_regs;
 unsigned long ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs);
 #else
@@ -157,6 +159,7 @@ struct ftrace_regs {
 #define ftrace_regs_set_instruction_pointer(fregs, ip) do { } while (0)
 #endif /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS */
 
+
 static __always_inline struct pt_regs *ftrace_get_regs(struct ftrace_regs 
*fregs)
 {
if (!fregs)
@@ -1067,6 +1070,10 @@ typedef int (*trace_func_graph_regs_ent_t)(unsigned long 
func,
   unsigned long parent_ip,
   struct ftrace_regs *fregs,
   struct fgraph_ops *); /* entry w/ 
regs */
+typedef void (*trace_func_graph_regs_ret_t)(unsigned long func,
+   unsigned long parent_ip,
+   struct ftrace_regs *,
+   struct fgraph_ops *); /* return w/ 
regs */
 
 extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, struct 
fgraph_ops *gops);
 
@@ -1076,6 +1083,7 @@ struct fgraph_ops {
trace_func_graph_ent_t  entryfunc;
trace_func_graph_ret_t  retfunc;
trace_func_graph_regs_ent_t entryregfunc;
+   trace_func_graph_regs_ret_t retregfunc;
struct ftrace_ops   ops; /* for the hash lists */
void*private;
int idx;
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 61c541c36596..308b3bec01b1 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -34,6 +34,9 @@ config HAVE_FUNCTION_GRAPH_TRACER
 config HAVE_FUNCTION_GRAPH_RETVAL
bool
 
+config HAVE_FUNCTION_GRAPH_FREGS
+   bool
+
 config HAVE_DYNAMIC_FTRACE
bool
help
@@ -232,7 +235,7 @@ config FUNCTION_GRAPH_TRACER
 
 config FUNCTION_GRAPH_RETVAL
bool "Kernel Function Graph Return Value"
-   depends on HAVE_FUNCTION_GRAPH_RETVAL
+   depends on HAVE_FUNCTION_GRAPH_RETVAL || HAVE_FUNCTION_GRAPH_FREGS
depends on FUNCTION_GRAPH_TRACER
default n
help
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 95b3eb4e8e23..0ac242d22724 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -685,8 +685,8 @@ int function_graph_enter_ops(unsigned long ret, unsigned 
long func,
 
 /* Retrieve a function return address to the trace stack on thread info.*/
 static struct ftrace_ret_stack *
-ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
-   unsigned long frame_pointer, int *index)
+ftrace_pop_return_trace(unsigned long *ret, unsigned long frame_pointer,
+   int *index)
 {
struct ftrace_ret_stack *ret_stack;
 
@@ -731,10 +731,6 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, 
unsigned long *ret,
 
*index += FGRAPH_RET_INDEX;
*ret = ret_stack->ret;
-   trace->func = ret_stack->func;
-   trace->calltime = ret_stack->calltime;
-   trace->overrun = atomic_read(>trace_overrun);
-   trace->depth = current->curr_ret_depth;
/*
 * We still

[PATCH v4 19/33] function_graph: Add a new entry handler with parent_ip and ftrace_regs

2023-12-08 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Add a new entry handler to fgraph_ops as 'entryregfunc'  which takes
parent_ip and ftrace_regs. Note that the 'entryfunc' and 'entryregfunc'
are mutual exclusive. You can set only one of them.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v3:
  - Update for new multiple fgraph.
---
 arch/arm64/kernel/ftrace.c   |2 +
 arch/loongarch/kernel/ftrace_dyn.c   |6 
 arch/powerpc/kernel/trace/ftrace.c   |2 +
 arch/powerpc/kernel/trace/ftrace_64_pg.c |   10 ---
 arch/x86/kernel/ftrace.c |   42 --
 include/linux/ftrace.h   |   19 +++---
 kernel/trace/fgraph.c|   30 +
 7 files changed, 76 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index 205937e04ece..9d71a475d4c5 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -495,7 +495,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long 
parent_ip,
if (bit < 0)
return;
 
-   if (!function_graph_enter_ops(*parent, ip, fregs->fp, parent, gops))
+   if (!function_graph_enter_ops(*parent, ip, fregs->fp, parent, fregs, 
gops))
*parent = (unsigned long)_to_handler;
 
ftrace_test_recursion_unlock(bit);
diff --git a/arch/loongarch/kernel/ftrace_dyn.c 
b/arch/loongarch/kernel/ftrace_dyn.c
index 73858c9029cc..39b3f09a5e0c 100644
--- a/arch/loongarch/kernel/ftrace_dyn.c
+++ b/arch/loongarch/kernel/ftrace_dyn.c
@@ -244,7 +244,11 @@ void ftrace_graph_func(unsigned long ip, unsigned long 
parent_ip,
struct pt_regs *regs = >regs;
unsigned long *parent = (unsigned long *)>regs[1];
 
-   prepare_ftrace_return(ip, (unsigned long *)parent);
+   if (unlikely(atomic_read(>tracing_graph_pause)))
+   return;
+
+   if (!function_graph_enter_regs(regs->regs[1], ip, 0, parent, fregs))
+   regs->regs[1] = (unsigned long)_to_handler;
 }
 #else
 static int ftrace_modify_graph_caller(bool enable)
diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 82010629cf88..9bf1b6912116 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -422,7 +422,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long 
parent_ip,
if (bit < 0)
goto out;
 
-   if (!function_graph_enter(parent_ip, ip, 0, (unsigned long *)sp))
+   if (!function_graph_enter_regs(parent_ip, ip, 0, (unsigned long *)sp, 
fregs))
parent_ip = ppc_function_entry(return_to_handler);
 
ftrace_test_recursion_unlock(bit);
diff --git a/arch/powerpc/kernel/trace/ftrace_64_pg.c 
b/arch/powerpc/kernel/trace/ftrace_64_pg.c
index 7b85c3b460a3..43f6cfaaf7db 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_pg.c
+++ b/arch/powerpc/kernel/trace/ftrace_64_pg.c
@@ -795,7 +795,8 @@ int ftrace_disable_ftrace_graph_caller(void)
  * in current thread info. Return the address we want to divert to.
  */
 static unsigned long
-__prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned long 
sp)
+__prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned long 
sp,
+   struct ftrace_regs *fregs)
 {
unsigned long return_hooker;
int bit;
@@ -812,7 +813,7 @@ __prepare_ftrace_return(unsigned long parent, unsigned long 
ip, unsigned long sp
 
return_hooker = ppc_function_entry(return_to_handler);
 
-   if (!function_graph_enter(parent, ip, 0, (unsigned long *)sp))
+   if (!function_graph_enter_regs(parent, ip, 0, (unsigned long *)sp, 
fregs))
parent = return_hooker;
 
ftrace_test_recursion_unlock(bit);
@@ -824,13 +825,14 @@ __prepare_ftrace_return(unsigned long parent, unsigned 
long ip, unsigned long sp
 void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
   struct ftrace_ops *op, struct ftrace_regs *fregs)
 {
-   fregs->regs.link = __prepare_ftrace_return(parent_ip, ip, 
fregs->regs.gpr[1]);
+   fregs->regs.link = __prepare_ftrace_return(parent_ip, ip,
+  fregs->regs.gpr[1], fregs);
 }
 #else
 unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
unsigned long sp)
 {
-   return __prepare_ftrace_return(parent, ip, sp);
+   return __prepare_ftrace_return(parent, ip, sp, NULL);
 }
 #endif
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 845e29b4254f..0f757e399a96 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -614,16 +614,8 @@ int ftrace_disable_ftrace_graph_caller(void)
 }
 #endif /* CONFIG_DYNAMIC_FTRACE && !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS */
 
-/*
- * Hook the return address and push it in the stack of return addrs
-

[PATCH v4 18/33] function_graph: Add selftest for passing local variables

2023-12-08 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Add boot up selftest that passes variables from a function entry to a
function exit, and make sure that they do get passed around.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v2:
  - Add reserved size test.
  - Use pr_*() instead of printk(KERN_*).
---
 kernel/trace/trace_selftest.c |  169 +
 1 file changed, 169 insertions(+)

diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index f0758afa2f7d..4d86cd4c8c8c 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -756,6 +756,173 @@ trace_selftest_startup_function(struct tracer *trace, 
struct trace_array *tr)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 
+#ifdef CONFIG_DYNAMIC_FTRACE
+
+#define BYTE_NUMBER 123
+#define SHORT_NUMBER 12345
+#define WORD_NUMBER 1234567890
+#define LONG_NUMBER 1234567890123456789LL
+
+static int fgraph_store_size __initdata;
+static const char *fgraph_store_type_name __initdata;
+static char *fgraph_error_str __initdata;
+static char fgraph_error_str_buf[128] __initdata;
+
+static __init int store_entry(struct ftrace_graph_ent *trace,
+ struct fgraph_ops *gops)
+{
+   const char *type = fgraph_store_type_name;
+   int size = fgraph_store_size;
+   void *p;
+
+   p = fgraph_reserve_data(gops->idx, size);
+   if (!p) {
+   snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+"Failed to reserve %s\n", type);
+   fgraph_error_str = fgraph_error_str_buf;
+   return 0;
+   }
+
+   switch (fgraph_store_size) {
+   case 1:
+   *(char *)p = BYTE_NUMBER;
+   break;
+   case 2:
+   *(short *)p = SHORT_NUMBER;
+   break;
+   case 4:
+   *(int *)p = WORD_NUMBER;
+   break;
+   case 8:
+   *(long long *)p = LONG_NUMBER;
+   break;
+   }
+
+   return 1;
+}
+
+static __init void store_return(struct ftrace_graph_ret *trace,
+   struct fgraph_ops *gops)
+{
+   const char *type = fgraph_store_type_name;
+   long long expect = 0;
+   long long found = -1;
+   int size;
+   char *p;
+
+   p = fgraph_retrieve_data(gops->idx, );
+   if (!p) {
+   snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+"Failed to retrieve %s\n", type);
+   fgraph_error_str = fgraph_error_str_buf;
+   return;
+   }
+   if (fgraph_store_size > size) {
+   snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+"Retrieved size %d is smaller than expected %d\n",
+size, (int)fgraph_store_size);
+   fgraph_error_str = fgraph_error_str_buf;
+   return;
+   }
+
+   switch (fgraph_store_size) {
+   case 1:
+   expect = BYTE_NUMBER;
+   found = *(char *)p;
+   break;
+   case 2:
+   expect = SHORT_NUMBER;
+   found = *(short *)p;
+   break;
+   case 4:
+   expect = WORD_NUMBER;
+   found = *(int *)p;
+   break;
+   case 8:
+   expect = LONG_NUMBER;
+   found = *(long long *)p;
+   break;
+   }
+
+   if (found != expect) {
+   snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+"%s returned not %lld but %lld\n", type, expect, 
found);
+   fgraph_error_str = fgraph_error_str_buf;
+   return;
+   }
+   fgraph_error_str = NULL;
+}
+
+static struct fgraph_ops store_bytes __initdata = {
+   .entryfunc  = store_entry,
+   .retfunc= store_return,
+};
+
+static int __init test_graph_storage_type(const char *name, int size)
+{
+   char *func_name;
+   int len;
+   int ret;
+
+   fgraph_store_type_name = name;
+   fgraph_store_size = size;
+
+   snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+"Failed to execute storage %s\n", name);
+   fgraph_error_str = fgraph_error_str_buf;
+
+   pr_cont("PASSED\n");
+   pr_info("Testing fgraph storage of %d byte%s: ", size, size > 1 ? "s" : 
"");
+
+   func_name = "*" __stringify(DYN_FTRACE_TEST_NAME);
+   len = strlen(func_name);
+
+   ret = ftrace_set_filter(_bytes.ops, func_name, len, 1);
+   if (ret && ret != -ENODEV) {
+   pr_cont("*Could not set filter* ");
+   return -1;
+   }
+
+   ret = register_ftrace_graph(_bytes);
+   if (ret) {
+   pr_warn("Failed to init store_bytes fgraph tracing\n");
+   return -1;
+   }
+
+   DYN_FTRACE_TEST_NAME();
+
+   unregister_ftrace_graph(_bytes);

[PATCH v4 17/33] function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()

2023-12-08 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Added functions that can be called by a fgraph_ops entryfunc and retfunc to
store state between the entry of the function being traced to the exit of
the same function. The fgraph_ops entryfunc() may call
fgraph_reserve_data() to store up to 32 words onto the task's shadow
ret_stack and this then can be retrieved by fgraph_retrieve_data() called
by the corresponding retfunc().

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v3:
  - Store fgraph_array index to the data entry.
  - Both function requires fgraph_array index to store/retrieve data.
  - Reserve correct size of the data.
  - Return correct data area.
 Changes in v2:
  - Retrieve the reserved size by fgraph_retrieve_data().
  - Expand the maximum data size to 32 words.
  - Update stack index with __get_index(val) if FGRAPH_TYPE_ARRAY entry.
  - fix typos and make description lines shorter than 76 chars.
---
 include/linux/ftrace.h |3 +
 kernel/trace/fgraph.c  |  175 ++--
 2 files changed, 170 insertions(+), 8 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 09ca4bba63f2..1c121237016e 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1075,6 +1075,9 @@ struct fgraph_ops {
int idx;
 };
 
+void *fgraph_reserve_data(int idx, int size_bytes);
+void *fgraph_retrieve_data(int idx, int *size_bytes);
+
 /*
  * Stack of return addresses for functions
  * of a thread.
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 5ac6477bcf20..d4d7007bf619 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -41,17 +41,29 @@
  * bits: 10 - 11   Type of storage
  *   0 - reserved
  *   1 - bitmap of fgraph_array index
+ *   2 - reserved data
  *
  * For bitmap of fgraph_array index
  *  bits: 12 - 27  The bitmap of fgraph_ops fgraph_array index
  *
+ * For reserved data:
+ *  bits: 12 - 17  The size in words that is stored
+ *  bits: 18 - 23  The index of fgraph_array, which shows who is stored
+ *
  * That is, at the end of function_graph_enter, if the first and forth
  * fgraph_ops on the fgraph_array[] (index 0 and 3) needs their retfunc called
- * on the return of the function being traced, this is what will be on the
- * task's shadow ret_stack: (the stack grows upward)
+ * on the return of the function being traced, and the forth fgraph_ops
+ * stored two words of data, this is what will be on the task's shadow
+ * ret_stack: (the stack grows upward)
  *
  * || <- task->curr_ret_stack
  * ++
+ * | data_type(idx:3, size:2,   |
+ * |   offset:FGRAPH_RET_INDEX+3)   | ( Data with size of 2 words)
+ * ++ ( It is 4 words from the 
ret_stack)
+ * |STORED DATA WORD 2  |
+ * |STORED DATA WORD 1  |
+ * +--i-+
  * | bitmap_type(bitmap:(BIT(3)|BIT(0)),|
  * | offset:FGRAPH_RET_INDEX)   | <- the offset is from here
  * ++
@@ -78,14 +90,23 @@
 enum {
FGRAPH_TYPE_RESERVED= 0,
FGRAPH_TYPE_BITMAP  = 1,
+   FGRAPH_TYPE_DATA= 2,
 };
 
 #define FGRAPH_INDEX_SIZE  16
 #define FGRAPH_INDEX_MASK  GENMASK(FGRAPH_INDEX_SIZE - 1, 0)
 #define FGRAPH_INDEX_SHIFT (FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
 
-/* Currently the max stack index can't be more than register callers */
-#define FGRAPH_MAX_INDEX   (FGRAPH_INDEX_SIZE + FGRAPH_RET_INDEX)
+#define FGRAPH_DATA_SIZE   5
+#define FGRAPH_DATA_MASK   ((1 << FGRAPH_DATA_SIZE) - 1)
+#define FGRAPH_DATA_SHIFT  (FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
+
+#define FGRAPH_DATA_INDEX_SIZE 4
+#define FGRAPH_DATA_INDEX_MASK ((1 << FGRAPH_DATA_INDEX_SIZE) - 1)
+#define FGRAPH_DATA_INDEX_SHIFT(FGRAPH_DATA_SHIFT + FGRAPH_DATA_SIZE)
+
+#define FGRAPH_MAX_INDEX   \
+   ((FGRAPH_INDEX_SIZE << FGRAPH_DATA_SIZE) + FGRAPH_RET_INDEX)
 
 #define FGRAPH_ARRAY_SIZE  FGRAPH_INDEX_SIZE
 
@@ -97,6 +118,8 @@ enum {
 
 #define RET_STACK(t, index) ((struct ftrace_ret_stack 
*)(&(t)->ret_stack[index]))
 
+#define FGRAPH_MAX_DATA_SIZE (sizeof(long) * (1 << FGRAPH_DATA_SIZE))
+
 /*
  * Each fgraph_ops has a reservered unsigned long at the end (top) of the
  * ret_stack to store task specific state.
@@ -145,14 +168,39 @@ static int fgraph_lru_alloc_index(void)
return idx;
 }
 
+static inline int __get_index(unsigned long val)
+{
+   return val & FGRAPH_RET_INDEX_MASK;
+}
+
+static inline int __get_type(unsigned long val)
+{
+   return (val >> FGRAPH_TYPE_SHIFT) & FGRAPH_TYPE_MASK;
+}
+
+static inline int __get_data_index(unsigned long val)
+{
+   return

[PATCH v4 16/33] function_graph: Move graph notrace bit to shadow stack global var

2023-12-08 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

The use of the task->trace_recursion for the logic used for the function
graph no-trace was a bit of an abuse of that variable. Now that there
exists global vars that are per stack for registered graph traces, use
that instead.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v2:
  - Make description lines shorter than 76 chars.
---
 include/linux/trace_recursion.h  |7 ---
 kernel/trace/trace.h |9 +
 kernel/trace/trace_functions_graph.c |   10 ++
 3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index 00e792bf148d..cc11b0e9d220 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -44,13 +44,6 @@ enum {
  */
TRACE_IRQ_BIT,
 
-   /*
-* To implement set_graph_notrace, if this bit is set, we ignore
-* function graph tracing of called functions, until the return
-* function is called to clear it.
-*/
-   TRACE_GRAPH_NOTRACE_BIT,
-
/* Used to prevent recursion recording from recursing. */
TRACE_RECORD_RECURSION_BIT,
 };
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index e3f452eda0e3..fac29f08d5d8 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -915,8 +915,17 @@ enum {
 
TRACE_GRAPH_DEPTH_START_BIT,
TRACE_GRAPH_DEPTH_END_BIT,
+
+   /*
+* To implement set_graph_notrace, if this bit is set, we ignore
+* function graph tracing of called functions, until the return
+* function is called to clear it.
+*/
+   TRACE_GRAPH_NOTRACE_BIT,
 };
 
+#define TRACE_GRAPH_NOTRACE(1 << TRACE_GRAPH_NOTRACE_BIT)
+
 static inline unsigned long ftrace_graph_depth(unsigned long *task_var)
 {
return (*task_var >> TRACE_GRAPH_DEPTH_START_BIT) & 3;
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index 66cce73e94f8..13d0387ac6a6 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -130,6 +130,7 @@ static inline int ftrace_graph_ignore_irqs(void)
 int trace_graph_entry(struct ftrace_graph_ent *trace,
  struct fgraph_ops *gops)
 {
+   unsigned long *task_var = fgraph_get_task_var(gops);
struct trace_array *tr = gops->private;
struct trace_array_cpu *data;
unsigned long flags;
@@ -138,7 +139,7 @@ int trace_graph_entry(struct ftrace_graph_ent *trace,
int ret;
int cpu;
 
-   if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT))
+   if (*task_var & TRACE_GRAPH_NOTRACE)
return 0;
 
/*
@@ -149,7 +150,7 @@ int trace_graph_entry(struct ftrace_graph_ent *trace,
 * returning from the function.
 */
if (ftrace_graph_notrace_addr(trace->func)) {
-   trace_recursion_set(TRACE_GRAPH_NOTRACE_BIT);
+   *task_var |= TRACE_GRAPH_NOTRACE_BIT;
/*
 * Need to return 1 to have the return called
 * that will clear the NOTRACE bit.
@@ -240,6 +241,7 @@ void __trace_graph_return(struct trace_array *tr,
 void trace_graph_return(struct ftrace_graph_ret *trace,
struct fgraph_ops *gops)
 {
+   unsigned long *task_var = fgraph_get_task_var(gops);
struct trace_array *tr = gops->private;
struct trace_array_cpu *data;
unsigned long flags;
@@ -249,8 +251,8 @@ void trace_graph_return(struct ftrace_graph_ret *trace,
 
ftrace_graph_addr_finish(gops, trace);
 
-   if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
-   trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
+   if (*task_var & TRACE_GRAPH_NOTRACE) {
+   *task_var &= ~TRACE_GRAPH_NOTRACE;
return;
}

[PATCH v4 15/33] function_graph: Move graph depth stored data to shadow stack global var

2023-12-08 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

The use of the task->trace_recursion for the logic used for the function
graph depth was a bit of an abuse of that variable. Now that there
exists global vars that are per stack for registered graph traces, use that
instead.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 include/linux/trace_recursion.h |   29 -
 kernel/trace/trace.h|   34 --
 2 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index 2efd5ec46d7f..00e792bf148d 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -44,25 +44,6 @@ enum {
  */
TRACE_IRQ_BIT,
 
-   /*
-* In the very unlikely case that an interrupt came in
-* at a start of graph tracing, and we want to trace
-* the function in that interrupt, the depth can be greater
-* than zero, because of the preempted start of a previous
-* trace. In an even more unlikely case, depth could be 2
-* if a softirq interrupted the start of graph tracing,
-* followed by an interrupt preempting a start of graph
-* tracing in the softirq, and depth can even be 3
-* if an NMI came in at the start of an interrupt function
-* that preempted a softirq start of a function that
-* preempted normal context Luckily, it can't be
-* greater than 3, so the next two bits are a mask
-* of what the depth is when we set TRACE_GRAPH_FL
-*/
-
-   TRACE_GRAPH_DEPTH_START_BIT,
-   TRACE_GRAPH_DEPTH_END_BIT,
-
/*
 * To implement set_graph_notrace, if this bit is set, we ignore
 * function graph tracing of called functions, until the return
@@ -78,16 +59,6 @@ enum {
 #define trace_recursion_clear(bit) do { (current)->trace_recursion &= 
~(1<<(bit)); } while (0)
 #define trace_recursion_test(bit)  ((current)->trace_recursion & 
(1<<(bit)))
 
-#define trace_recursion_depth() \
-   (((current)->trace_recursion >> TRACE_GRAPH_DEPTH_START_BIT) & 3)
-#define trace_recursion_set_depth(depth) \
-   do {\
-   current->trace_recursion &= \
-   ~(3 << TRACE_GRAPH_DEPTH_START_BIT);\
-   current->trace_recursion |= \
-   ((depth) & 3) << TRACE_GRAPH_DEPTH_START_BIT;   \
-   } while (0)
-
 #define TRACE_CONTEXT_BITS 4
 
 #define TRACE_FTRACE_START TRACE_FTRACE_BIT
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 3b9266aeb43b..e3f452eda0e3 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -896,8 +896,38 @@ extern void free_fgraph_ops(struct trace_array *tr);
 
 enum {
TRACE_GRAPH_FL  = 1,
+
+   /*
+* In the very unlikely case that an interrupt came in
+* at a start of graph tracing, and we want to trace
+* the function in that interrupt, the depth can be greater
+* than zero, because of the preempted start of a previous
+* trace. In an even more unlikely case, depth could be 2
+* if a softirq interrupted the start of graph tracing,
+* followed by an interrupt preempting a start of graph
+* tracing in the softirq, and depth can even be 3
+* if an NMI came in at the start of an interrupt function
+* that preempted a softirq start of a function that
+* preempted normal context Luckily, it can't be
+* greater than 3, so the next two bits are a mask
+* of what the depth is when we set TRACE_GRAPH_FL
+*/
+
+   TRACE_GRAPH_DEPTH_START_BIT,
+   TRACE_GRAPH_DEPTH_END_BIT,
 };
 
+static inline unsigned long ftrace_graph_depth(unsigned long *task_var)
+{
+   return (*task_var >> TRACE_GRAPH_DEPTH_START_BIT) & 3;
+}
+
+static inline void ftrace_graph_set_depth(unsigned long *task_var, int depth)
+{
+   *task_var &= ~(3 << TRACE_GRAPH_DEPTH_START_BIT);
+   *task_var |= (depth & 3) << TRACE_GRAPH_DEPTH_START_BIT;
+}
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 extern struct ftrace_hash __rcu *ftrace_graph_hash;
 extern struct ftrace_hash __rcu *ftrace_graph_notrace_hash;
@@ -930,7 +960,7 @@ ftrace_graph_addr(unsigned long *task_var, struct 
ftrace_graph_ent *trace)
 * when the depth is zero.
 */
*task_var |= TRACE_GRAPH_FL;
-   trace_recursion_set_depth(trace->depth);
+   ftrace_graph_set_depth(task_var, trace->depth);
 
/*
 * If no irqs are to be traced, but a set_graph_function
@@ -955,7 +985,7 @@ ftrace_graph_addr_finish(struct fgraph_ops *gops, struct 
ftrace_graph_ret *trace
unsigned long *task_var = fgraph_get_task_var(gops);
 
if

[PATCH v4 14/33] function_graph: Move set_graph_function tests to shadow stack global var

2023-12-08 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

The use of the task->trace_recursion for the logic used for the
set_graph_funnction was a bit of an abuse of that variable. Now that there
exists global vars that are per stack for registered graph traces, use that
instead.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 include/linux/trace_recursion.h  |5 +
 kernel/trace/trace.h |   32 +---
 kernel/trace/trace_functions_graph.c |6 +++---
 kernel/trace/trace_irqsoff.c |4 ++--
 kernel/trace/trace_sched_wakeup.c|4 ++--
 5 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index d48cd92d2364..2efd5ec46d7f 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -44,9 +44,6 @@ enum {
  */
TRACE_IRQ_BIT,
 
-   /* Set if the function is in the set_graph_function file */
-   TRACE_GRAPH_BIT,
-
/*
 * In the very unlikely case that an interrupt came in
 * at a start of graph tracing, and we want to trace
@@ -60,7 +57,7 @@ enum {
 * that preempted a softirq start of a function that
 * preempted normal context Luckily, it can't be
 * greater than 3, so the next two bits are a mask
-* of what the depth is when we set TRACE_GRAPH_BIT
+* of what the depth is when we set TRACE_GRAPH_FL
 */
 
TRACE_GRAPH_DEPTH_START_BIT,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 02edfdb68933..3b9266aeb43b 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -894,11 +894,16 @@ extern void init_array_fgraph_ops(struct trace_array *tr, 
struct ftrace_ops *ops
 extern int allocate_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops);
 extern void free_fgraph_ops(struct trace_array *tr);
 
+enum {
+   TRACE_GRAPH_FL  = 1,
+};
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 extern struct ftrace_hash __rcu *ftrace_graph_hash;
 extern struct ftrace_hash __rcu *ftrace_graph_notrace_hash;
 
-static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
+static inline int
+ftrace_graph_addr(unsigned long *task_var, struct ftrace_graph_ent *trace)
 {
unsigned long addr = trace->func;
int ret = 0;
@@ -920,12 +925,11 @@ static inline int ftrace_graph_addr(struct 
ftrace_graph_ent *trace)
}
 
if (ftrace_lookup_ip(hash, addr)) {
-
/*
 * This needs to be cleared on the return functions
 * when the depth is zero.
 */
-   trace_recursion_set(TRACE_GRAPH_BIT);
+   *task_var |= TRACE_GRAPH_FL;
trace_recursion_set_depth(trace->depth);
 
/*
@@ -945,11 +949,14 @@ static inline int ftrace_graph_addr(struct 
ftrace_graph_ent *trace)
return ret;
 }
 
-static inline void ftrace_graph_addr_finish(struct ftrace_graph_ret *trace)
+static inline void
+ftrace_graph_addr_finish(struct fgraph_ops *gops, struct ftrace_graph_ret 
*trace)
 {
-   if (trace_recursion_test(TRACE_GRAPH_BIT) &&
+   unsigned long *task_var = fgraph_get_task_var(gops);
+
+   if ((*task_var & TRACE_GRAPH_FL) &&
trace->depth == trace_recursion_depth())
-   trace_recursion_clear(TRACE_GRAPH_BIT);
+   *task_var &= ~TRACE_GRAPH_FL;
 }
 
 static inline int ftrace_graph_notrace_addr(unsigned long addr)
@@ -976,7 +983,7 @@ static inline int ftrace_graph_notrace_addr(unsigned long 
addr)
 }
 
 #else
-static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
+static inline int ftrace_graph_addr(unsigned long *task_var, struct 
ftrace_graph_ent *trace)
 {
return 1;
 }
@@ -985,17 +992,20 @@ static inline int ftrace_graph_notrace_addr(unsigned long 
addr)
 {
return 0;
 }
-static inline void ftrace_graph_addr_finish(struct ftrace_graph_ret *trace)
+static inline void ftrace_graph_addr_finish(struct fgraph_ops *gops, struct 
ftrace_graph_ret *trace)
 { }
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 extern unsigned int fgraph_max_depth;
 
-static inline bool ftrace_graph_ignore_func(struct ftrace_graph_ent *trace)
+static inline bool
+ftrace_graph_ignore_func(struct fgraph_ops *gops, struct ftrace_graph_ent 
*trace)
 {
+   unsigned long *task_var = fgraph_get_task_var(gops);
+
/* trace it when it is-nested-in or is a function enabled. */
-   return !(trace_recursion_test(TRACE_GRAPH_BIT) ||
-ftrace_graph_addr(trace)) ||
+   return !((*task_var & TRACE_GRAPH_FL) ||
+ftrace_graph_addr(task_var, trace)) ||
(trace->depth < 0) ||
(fgraph_max_depth && trace->depth >= fgraph_max_depth);
 }
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index 7f30652f0e97..66cce73e94f8 100644
--- a/kernel/trace/trace_functions_graph.c
+++

[PATCH v4 13/33] function_graph: Add "task variables" per task for fgraph_ops

2023-12-08 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Add a "task variables" array on the tasks shadow ret_stack that is the
size of longs for each possible registered fgraph_ops. That's a total
of 16, taking up 8 * 16 = 128 bytes (out of a page size 4k).

This will allow for fgraph_ops to do specific features on a per task basis
having a way to maintain state for each task.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v3:
  - Move fgraph_ops::idx to previous patch in the series.
 Changes in v2:
  - Make description lines shorter than 76 chars.
---
 include/linux/ftrace.h |1 +
 kernel/trace/fgraph.c  |   70 +++-
 2 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index c431a33fe789..09ca4bba63f2 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1116,6 +1116,7 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int 
idx);
 
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
unsigned long ret, unsigned long *retp);
+unsigned long *fgraph_get_task_var(struct fgraph_ops *gops);
 
 /*
  * Sometimes we don't want to trace a function with the function
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index d8b139959b62..5ac6477bcf20 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -92,10 +92,18 @@ enum {
 #define SHADOW_STACK_SIZE (PAGE_SIZE)
 #define SHADOW_STACK_INDEX (SHADOW_STACK_SIZE / sizeof(long))
 /* Leave on a buffer at the end */
-#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - (FGRAPH_RET_INDEX + 1))
+#define SHADOW_STACK_MAX_INDEX \
+   (SHADOW_STACK_INDEX - (FGRAPH_RET_INDEX + 1 + FGRAPH_ARRAY_SIZE))
 
 #define RET_STACK(t, index) ((struct ftrace_ret_stack 
*)(&(t)->ret_stack[index]))
 
+/*
+ * Each fgraph_ops has a reservered unsigned long at the end (top) of the
+ * ret_stack to store task specific state.
+ */
+#define SHADOW_STACK_TASK_VARS(ret_stack) \
+   ((unsigned long *)(&(ret_stack)[SHADOW_STACK_INDEX - 
FGRAPH_ARRAY_SIZE]))
+
 DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
 int ftrace_graph_active;
 
@@ -182,6 +190,44 @@ static void return_run(struct ftrace_graph_ret *trace, 
struct fgraph_ops *ops)
 {
 }
 
+static void ret_stack_set_task_var(struct task_struct *t, int idx, long val)
+{
+   unsigned long *gvals = SHADOW_STACK_TASK_VARS(t->ret_stack);
+
+   gvals[idx] = val;
+}
+
+static unsigned long *
+ret_stack_get_task_var(struct task_struct *t, int idx)
+{
+   unsigned long *gvals = SHADOW_STACK_TASK_VARS(t->ret_stack);
+
+   return [idx];
+}
+
+static void ret_stack_init_task_vars(unsigned long *ret_stack)
+{
+   unsigned long *gvals = SHADOW_STACK_TASK_VARS(ret_stack);
+
+   memset(gvals, 0, sizeof(*gvals) * FGRAPH_ARRAY_SIZE);
+}
+
+/**
+ * fgraph_get_task_var - retrieve a task specific state variable
+ * @gops: The ftrace_ops that owns the task specific variable
+ *
+ * Every registered fgraph_ops has a task state variable
+ * reserved on the task's ret_stack. This function returns the
+ * address to that variable.
+ *
+ * Returns the address to the fgraph_ops @gops tasks specific
+ * unsigned long variable.
+ */
+unsigned long *fgraph_get_task_var(struct fgraph_ops *gops)
+{
+   return ret_stack_get_task_var(current, gops->idx);
+}
+
 /*
  * @offset: The index into @t->ret_stack to find the ret_stack entry
  * @index: Where to place the index into @t->ret_stack of that entry
@@ -788,6 +834,7 @@ static int alloc_retstack_tasklist(unsigned long 
**ret_stack_list)
 
if (t->ret_stack == NULL) {
atomic_set(>trace_overrun, 0);
+   ret_stack_init_task_vars(ret_stack_list[start]);
t->curr_ret_stack = 0;
t->curr_ret_depth = -1;
/* Make sure the tasks see the 0 first: */
@@ -848,6 +895,7 @@ static void
 graph_init_task(struct task_struct *t, unsigned long *ret_stack)
 {
atomic_set(>trace_overrun, 0);
+   ret_stack_init_task_vars(ret_stack);
t->ftrace_timestamp = 0;
t->curr_ret_stack = 0;
t->curr_ret_depth = -1;
@@ -946,6 +994,24 @@ static int start_graph_tracing(void)
return ret;
 }
 
+static void init_task_vars(int idx)
+{
+   struct task_struct *g, *t;
+   int cpu;
+
+   for_each_online_cpu(cpu) {
+   if (idle_task(cpu)->ret_stack)
+   ret_stack_set_task_var(idle_task(cpu), idx, 0);
+   }
+
+   read_lock(_lock);
+   for_each_process_thread(g, t) {
+   if (t->ret_stack)
+   ret_stack_set_task_var(t, idx, 0);
+   }
+   read_unlock(_lock);
+}
+
 int register_ftrace_graph(struct fgraph_ops *gops)
 {
int command = 0;
@@ -995,6 +1061,8 @@ int register_ftrace_graph(struct fgraph_ops *gops)

[PATCH v4 12/33] function_graph: Use a simple LRU for fgraph_array index number

2023-12-08 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Since the fgraph_array index is used for the bitmap on the shadow
stack, it may leave some entries after a function_graph instance is
removed. Thus if another instance reuses the fgraph_array index soon
after releasing it, the fgraph may confuse to call the newer callback
for the entries which are pushed by the older instance.
To avoid reusing the fgraph_array index soon after releasing, introduce
a simple LRU table for managing the index number. This will reduce the
possibility of this confusion.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v4:
  - Newly added.
---
 kernel/trace/fgraph.c |   67 ++---
 1 file changed, 47 insertions(+), 20 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 6f537ebd3ed7..d8b139959b62 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -99,10 +99,44 @@ enum {
 DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
 int ftrace_graph_active;
 
-static int fgraph_array_cnt;
-
 static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
 
+/* LRU index table for fgraph_array */
+static int fgraph_lru_table[FGRAPH_ARRAY_SIZE];
+static int fgraph_lru_next;
+static int fgraph_lru_last;
+
+static void fgraph_lru_init(void)
+{
+   int i;
+
+   for (i = 0; i < FGRAPH_ARRAY_SIZE; i++)
+   fgraph_lru_table[i] = i;
+}
+
+static int fgraph_lru_release_index(int idx)
+{
+   if (idx < 0 || idx >= FGRAPH_ARRAY_SIZE ||
+   fgraph_lru_table[fgraph_lru_last] != -1)
+   return -1;
+
+   fgraph_lru_table[fgraph_lru_last] = idx;
+   fgraph_lru_last = (fgraph_lru_last + 1) % FGRAPH_ARRAY_SIZE;
+   return fgraph_lru_last - 1;
+}
+
+static int fgraph_lru_alloc_index(void)
+{
+   int idx = fgraph_lru_table[fgraph_lru_next];
+
+   if (idx == -1)
+   return -1;
+
+   fgraph_lru_table[fgraph_lru_next] = -1;
+   fgraph_lru_next = (fgraph_lru_next + 1) % FGRAPH_ARRAY_SIZE;
+   return idx;
+}
+
 static inline int get_ret_stack_index(struct task_struct *t, int offset)
 {
return t->ret_stack[offset] & FGRAPH_RET_INDEX_MASK;
@@ -367,7 +401,7 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
if (index < 0)
goto out;
 
-   for (i = 0; i < fgraph_array_cnt; i++) {
+   for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
struct fgraph_ops *gops = fgraph_array[i];
 
if (gops == _stub)
@@ -932,21 +966,17 @@ int register_ftrace_graph(struct fgraph_ops *gops)
/* The array must always have real data on it */
for (i = 0; i < FGRAPH_ARRAY_SIZE; i++)
fgraph_array[i] = _stub;
+   fgraph_lru_init();
}
 
-   /* Look for an available spot */
-   for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
-   if (fgraph_array[i] == _stub)
-   break;
-   }
-   if (i >= FGRAPH_ARRAY_SIZE) {
+   i = fgraph_lru_alloc_index();
+   if (i < 0 ||
+   WARN_ON_ONCE(fgraph_array[i] != _stub)) {
ret = -EBUSY;
goto out;
}
 
fgraph_array[i] = gops;
-   if (i + 1 > fgraph_array_cnt)
-   fgraph_array_cnt = i + 1;
gops->idx = i;
 
ftrace_graph_active++;
@@ -976,25 +1006,22 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 void unregister_ftrace_graph(struct fgraph_ops *gops)
 {
int command = 0;
-   int i;
 
mutex_lock(_lock);
 
if (unlikely(!ftrace_graph_active))
goto out;
 
-   if (unlikely(gops->idx < 0 || gops->idx >= fgraph_array_cnt))
+   if (unlikely(gops->idx < 0 || gops->idx >= FGRAPH_ARRAY_SIZE))
+   goto out;
+
+   if (WARN_ON_ONCE(fgraph_array[gops->idx] != gops))
goto out;
 
-   WARN_ON_ONCE(fgraph_array[gops->idx] != gops);
+   if (fgraph_lru_release_index(gops->idx) < 0)
+   goto out;
 
fgraph_array[gops->idx] = _stub;
-   if (gops->idx + 1 == fgraph_array_cnt) {
-   i = gops->idx;
-   while (i >= 0 && fgraph_array[i] == _stub)
-   i--;
-   fgraph_array_cnt = i + 1;
-   }
 
ftrace_graph_active--;

[PATCH v4 11/33] function_graph: Have the instances use their own ftrace_ops for filtering

2023-12-08 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Allow for instances to have their own ftrace_ops part of the fgraph_ops
that makes the funtion_graph tracer filter on the set_ftrace_filter file
of the instance and not the top instance.

This also change how the function_graph handles multiple instances on the
shadow stack. Previously we use ARRAY type entries to record which one
is enabled, and this makes it a bitmap of the fgraph_array's indexes.
Previous function_graph_enter() expects calling back from
prepare_ftrace_return() function which is called back only once if it is
enabled. But this introduces different ftrace_ops for each fgraph
instance and those are called from ftrace_graph_func() one by one. Thus
we can not loop on the fgraph_array(), and need to reuse the ret_stack
pushed by the previous instance. Finding the ret_stack is easy because
we can check the ret_stack->func. But that is not enough for the self-
recursive tail-call case. Thus fgraph uses the bitmap entry to find it
is already set (this means that entry is for previous tail call).

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v4:
  - Simplify get_ret_stack() sanity check and use WARN_ON_ONCE() for
obviously wrong value.
  - Do not check ret == return_to_handler but always read the previous
ret_stack in ftrace_push_return_trace() to check it is reusable.
  - Set the bit 0 of the bitmap entry always in function_graph_enter()
because it uses bit 0 to check re-usability.
  - Fix to ensure the ret_stack entry is bitmap type when checking the
bitmap.
 Changes in v3:
  - Pass current fgraph_ops to the new entry handler
   (function_graph_enter_ops) if fgraph use ftrace.
  - Add fgraph_ops::idx in this patch.
  - Replace the array type with the bitmap type so that it can record
which fgraph is called.
  - Fix some helper function to use passed task_struct instead of current.
  - Reduce the ret-index size to 1024 words.
  - Make the ret-index directly points the ret_stack.
  - Fix ftrace_graph_ret_addr() to handle tail-call case correctly.
 Changes in v2:
  - Use ftrace_graph_func and FTRACE_OPS_GRAPH_STUB instead of
ftrace_stub and FTRACE_OPS_FL_STUB for new ftrace based fgraph.
---
 arch/arm64/kernel/ftrace.c   |   19 ++
 arch/x86/kernel/ftrace.c |   19 ++
 include/linux/ftrace.h   |7 +
 kernel/trace/fgraph.c|  369 --
 kernel/trace/ftrace.c|6 -
 kernel/trace/trace.h |   16 +
 kernel/trace/trace_functions.c   |2 
 kernel/trace/trace_functions_graph.c |8 +
 8 files changed, 277 insertions(+), 169 deletions(-)

diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index a650f5e11fc5..205937e04ece 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -481,7 +481,24 @@ void prepare_ftrace_return(unsigned long self_addr, 
unsigned long *parent,
 void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
   struct ftrace_ops *op, struct ftrace_regs *fregs)
 {
-   prepare_ftrace_return(ip, >lr, fregs->fp);
+   unsigned long *parent = >lr;
+   struct fgraph_ops *gops = container_of(op, struct fgraph_ops, ops);
+   int bit;
+
+   if (unlikely(ftrace_graph_is_dead()))
+   return;
+
+   if (unlikely(atomic_read(>tracing_graph_pause)))
+   return;
+
+   bit = ftrace_test_recursion_trylock(ip, *parent);
+   if (bit < 0)
+   return;
+
+   if (!function_graph_enter_ops(*parent, ip, fregs->fp, parent, gops))
+   *parent = (unsigned long)_to_handler;
+
+   ftrace_test_recursion_unlock(bit);
 }
 #else
 /*
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 12df54ff0e81..845e29b4254f 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -657,9 +657,24 @@ void ftrace_graph_func(unsigned long ip, unsigned long 
parent_ip,
   struct ftrace_ops *op, struct ftrace_regs *fregs)
 {
struct pt_regs *regs = >regs;
-   unsigned long *stack = (unsigned long *)kernel_stack_pointer(regs);
+   unsigned long *parent = (unsigned long *)kernel_stack_pointer(regs);
+   struct fgraph_ops *gops = container_of(op, struct fgraph_ops, ops);
+   int bit;
+
+   if (unlikely(ftrace_graph_is_dead()))
+   return;
+
+   if (unlikely(atomic_read(>tracing_graph_pause)))
+   return;
 
-   prepare_ftrace_return(ip, (unsigned long *)stack, 0);
+   bit = ftrace_test_recursion_trylock(ip, *parent);
+   if (bit < 0)
+   return;
+
+   if (!function_graph_enter_ops(*parent, ip, 0, parent, gops))
+   *parent = (unsigned long)_to_handler;
+
+   ftrace_test_recursion_unlock(bit);
 }
 #endif
 
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 7b08169aa51d..c431a33fe789 100644
---

[PATCH v4 10/33] ftrace: Allow ftrace startup flags exist without dynamic ftrace

2023-12-08 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Some of the flags for ftrace_startup() may be exposed even when
CONFIG_DYNAMIC_FTRACE is not configured in. This is fine as the difference
between dynamic ftrace and static ftrace is done within the internals of
ftrace itself. No need to have use cases fail to compile because dynamic
ftrace is disabled.

This change is needed to move some of the logic of what is passed to
ftrace_startup() out of the parameters of ftrace_startup().

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 include/linux/ftrace.h |   18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 0955baccbb87..7b08169aa51d 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -538,6 +538,15 @@ static inline void stack_tracer_disable(void) { }
 static inline void stack_tracer_enable(void) { }
 #endif
 
+enum {
+   FTRACE_UPDATE_CALLS = (1 << 0),
+   FTRACE_DISABLE_CALLS= (1 << 1),
+   FTRACE_UPDATE_TRACE_FUNC= (1 << 2),
+   FTRACE_START_FUNC_RET   = (1 << 3),
+   FTRACE_STOP_FUNC_RET= (1 << 4),
+   FTRACE_MAY_SLEEP= (1 << 5),
+};
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 
 void ftrace_arch_code_modify_prepare(void);
@@ -632,15 +641,6 @@ void ftrace_set_global_notrace(unsigned char *buf, int 
len, int reset);
 void ftrace_free_filter(struct ftrace_ops *ops);
 void ftrace_ops_set_global_filter(struct ftrace_ops *ops);
 
-enum {
-   FTRACE_UPDATE_CALLS = (1 << 0),
-   FTRACE_DISABLE_CALLS= (1 << 1),
-   FTRACE_UPDATE_TRACE_FUNC= (1 << 2),
-   FTRACE_START_FUNC_RET   = (1 << 3),
-   FTRACE_STOP_FUNC_RET= (1 << 4),
-   FTRACE_MAY_SLEEP= (1 << 5),
-};
-
 /*
  * The FTRACE_UPDATE_* enum is used to pass information back
  * from the ftrace_update_record() and ftrace_test_record()

[PATCH v4 09/33] ftrace: Allow function_graph tracer to be enabled in instances

2023-12-08 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Now that function graph tracing can handle more than one user, allow it to
be enabled in the ftrace instances. Note, the filtering of the functions is
still joined by the top level set_ftrace_filter and friends, as well as the
graph and nograph files.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v2:
  - Fix to remove set_graph_array() completely.
---
 include/linux/ftrace.h   |1 +
 kernel/trace/ftrace.c|1 +
 kernel/trace/trace.h |   13 ++-
 kernel/trace/trace_functions.c   |8 
 kernel/trace/trace_functions_graph.c |   65 +-
 kernel/trace/trace_selftest.c|4 +-
 6 files changed, 64 insertions(+), 28 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index bc216cdc742d..0955baccbb87 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1070,6 +1070,7 @@ extern int ftrace_graph_entry_stub(struct 
ftrace_graph_ent *trace, struct fgraph
 struct fgraph_ops {
trace_func_graph_ent_t  entryfunc;
trace_func_graph_ret_t  retfunc;
+   void*private;
 };
 
 /*
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 7ff5c454622a..83fbfb7b48f8 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -7319,6 +7319,7 @@ __init void ftrace_init_global_array_ops(struct 
trace_array *tr)
tr->ops = _ops;
tr->ops->private = tr;
ftrace_init_trace_array(tr);
+   init_array_fgraph_ops(tr);
 }
 
 void ftrace_init_array_ops(struct trace_array *tr, ftrace_func_t func)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index d72d5e81f48a..16948c0ed00a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -395,6 +395,9 @@ struct trace_array {
struct ftrace_ops   *ops;
struct trace_pid_list   __rcu *function_pids;
struct trace_pid_list   __rcu *function_no_pids;
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+   struct fgraph_ops   *gops;
+#endif
 #ifdef CONFIG_DYNAMIC_FTRACE
/* All of these are protected by the ftrace_lock */
struct list_headfunc_probes;
@@ -677,7 +680,6 @@ void print_trace_header(struct seq_file *m, struct 
trace_iterator *iter);
 
 void trace_graph_return(struct ftrace_graph_ret *trace, struct fgraph_ops 
*gops);
 int trace_graph_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops);
-void set_graph_array(struct trace_array *tr);
 
 void tracing_start_cmdline_record(void);
 void tracing_stop_cmdline_record(void);
@@ -888,6 +890,9 @@ extern int __trace_graph_entry(struct trace_array *tr,
 extern void __trace_graph_return(struct trace_array *tr,
 struct ftrace_graph_ret *trace,
 unsigned int trace_ctx);
+extern void init_array_fgraph_ops(struct trace_array *tr);
+extern int allocate_fgraph_ops(struct trace_array *tr);
+extern void free_fgraph_ops(struct trace_array *tr);
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 extern struct ftrace_hash __rcu *ftrace_graph_hash;
@@ -1000,6 +1005,12 @@ print_graph_function_flags(struct trace_iterator *iter, 
u32 flags)
 {
return TRACE_TYPE_UNHANDLED;
 }
+static inline void init_array_fgraph_ops(struct trace_array *tr) { }
+static inline int allocate_fgraph_ops(struct trace_array *tr)
+{
+   return 0;
+}
+static inline void free_fgraph_ops(struct trace_array *tr) { }
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
 extern struct list_head ftrace_pids;
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 9f1bfbe105e8..8e8da0d0ee52 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -80,6 +80,7 @@ void ftrace_free_ftrace_ops(struct trace_array *tr)
 int ftrace_create_function_files(struct trace_array *tr,
 struct dentry *parent)
 {
+   int ret;
/*
 * The top level array uses the "global_ops", and the files are
 * created on boot up.
@@ -90,6 +91,12 @@ int ftrace_create_function_files(struct trace_array *tr,
if (!tr->ops)
return -EINVAL;
 
+   ret = allocate_fgraph_ops(tr);
+   if (ret) {
+   kfree(tr->ops);
+   return ret;
+   }
+
ftrace_create_filter_files(tr->ops, parent);
 
return 0;
@@ -99,6 +106,7 @@ void ftrace_destroy_function_files(struct trace_array *tr)
 {
ftrace_destroy_filter_files(tr->ops);
ftrace_free_ftrace_ops(tr);
+   free_fgraph_ops(tr);
 }
 
 static ftrace_func_t select_trace_function(u32 flags_val)
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index b7b142b65299..9ccc904a7703 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -83,8 +83,6 @@ static struct tracer_flags tracer_flags = {

[PATCH v4 08/33] ftrace/function_graph: Pass fgraph_ops to function graph callbacks

2023-12-08 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Pass the fgraph_ops structure to the function graph callbacks. This will
allow callbacks to add a descriptor to a fgraph_ops private field that wil
be added in the future and use it for the callbacks. This will be useful
when more than one callback can be registered to the function graph tracer.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v2:
  - cleanup to set argument name on function prototype.
---
 include/linux/ftrace.h   |   10 +++---
 kernel/trace/fgraph.c|   16 +---
 kernel/trace/ftrace.c|6 --
 kernel/trace/trace.h |4 ++--
 kernel/trace/trace_functions_graph.c |   11 +++
 kernel/trace/trace_irqsoff.c |6 --
 kernel/trace/trace_sched_wakeup.c|6 --
 kernel/trace/trace_selftest.c|5 +++--
 8 files changed, 40 insertions(+), 24 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 8b48fc621ea0..bc216cdc742d 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1055,11 +1055,15 @@ struct ftrace_graph_ret {
unsigned long long rettime;
 } __packed;
 
+struct fgraph_ops;
+
 /* Type of the callback handlers for tracing function graph*/
-typedef void (*trace_func_graph_ret_t)(struct ftrace_graph_ret *); /* return */
-typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *); /* entry */
+typedef void (*trace_func_graph_ret_t)(struct ftrace_graph_ret *,
+  struct fgraph_ops *); /* return */
+typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *,
+ struct fgraph_ops *); /* entry */
 
-extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace);
+extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, struct 
fgraph_ops *gops);
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 97a9ffb8bb4c..62c35d6d95f9 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -129,13 +129,13 @@ static inline int get_fgraph_array(struct task_struct *t, 
int offset)
 }
 
 /* ftrace_graph_entry set to this to tell some archs to run function graph */
-static int entry_run(struct ftrace_graph_ent *trace)
+static int entry_run(struct ftrace_graph_ent *trace, struct fgraph_ops *ops)
 {
return 0;
 }
 
 /* ftrace_graph_return set to this to tell some archs to run function graph */
-static void return_run(struct ftrace_graph_ret *trace)
+static void return_run(struct ftrace_graph_ret *trace, struct fgraph_ops *ops)
 {
 }
 
@@ -199,12 +199,14 @@ int __weak ftrace_disable_ftrace_graph_caller(void)
 }
 #endif
 
-int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
+int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace,
+   struct fgraph_ops *gops)
 {
return 0;
 }
 
-static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace)
+static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace,
+ struct fgraph_ops *gops)
 {
 }
 
@@ -358,7 +360,7 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
atomic_inc(>trace_overrun);
break;
}
-   if (fgraph_array[i]->entryfunc()) {
+   if (fgraph_array[i]->entryfunc(, fgraph_array[i])) {
offset = current->curr_ret_stack;
/* Check the top level stored word */
type = get_fgraph_type(current, offset - 1);
@@ -532,7 +534,7 @@ static unsigned long __ftrace_return_to_handler(struct 
fgraph_ret_regs *ret_regs
i = 0;
do {
idx = get_fgraph_array(current, offset - i);
-   fgraph_array[idx]->retfunc();
+   fgraph_array[idx]->retfunc(, fgraph_array[idx]);
i++;
} while (i < index);
 
@@ -674,7 +676,7 @@ void ftrace_graph_sleep_time_control(bool enable)
  * Simply points to ftrace_stub, but with the proper protocol.
  * Defined by the linker script in linux/vmlinux.lds.h
  */
-extern void ftrace_stub_graph(struct ftrace_graph_ret *);
+void ftrace_stub_graph(struct ftrace_graph_ret *trace, struct fgraph_ops 
*gops);
 
 /* The callbacks that hook a function */
 trace_func_graph_ret_t ftrace_graph_return = ftrace_stub_graph;
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index fd64021ec52f..7ff5c454622a 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -815,7 +815,8 @@ void ftrace_graph_graph_time_control(bool enable)
fgraph_graph_time = enable;
 }
 
-static int profile_graph_entry(struct ftrace_graph_ent *trace)
+static int profile_graph_entry(struct ftrace_graph_ent *trace,
+  struct fgraph_ops *gops)
 {
struct ftrace_ret_stack *ret_stack;
 
@@ -832,7 +833,8 @@ static int

[PATCH v4 07/33] function_graph: Remove logic around ftrace_graph_entry and return

2023-12-08 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

The function pointers ftrace_graph_entry and ftrace_graph_return are no
longer called via the function_graph tracer. Instead, an array structure is
now used that will allow for multiple users of the function_graph
infrastructure. The variables are still used by the architecture code for
non dynamic ftrace configs, where a test is made against them to see if
they point to the default stub function or not. This is how the static
function tracing knows to call into the function graph tracer
infrastructure or not.

Two new stub functions are made. entry_run() and return_run(). The
ftrace_graph_entry and ftrace_graph_return are set to them respectively
when the function graph tracer is enabled, and this will trigger the
architecture specific function graph code to be executed.

This also requires checking the global_ops hash for all calls into the
function_graph tracer.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v2:
  - Fix typo and make lines shorter than 76 chars in the description.
  - Remove unneeded return from return_run() function.
---
 kernel/trace/fgraph.c  |   71 +++-
 kernel/trace/ftrace.c  |2 -
 kernel/trace/ftrace_internal.h |2 -
 3 files changed, 19 insertions(+), 56 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 8aba93be11b2..97a9ffb8bb4c 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -128,6 +128,17 @@ static inline int get_fgraph_array(struct task_struct *t, 
int offset)
FGRAPH_ARRAY_MASK;
 }
 
+/* ftrace_graph_entry set to this to tell some archs to run function graph */
+static int entry_run(struct ftrace_graph_ent *trace)
+{
+   return 0;
+}
+
+/* ftrace_graph_return set to this to tell some archs to run function graph */
+static void return_run(struct ftrace_graph_ret *trace)
+{
+}
+
 /*
  * @offset: The index into @t->ret_stack to find the ret_stack entry
  * @index: Where to place the index into @t->ret_stack of that entry
@@ -323,6 +334,10 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
ftrace_find_rec_direct(ret - MCOUNT_INSN_SIZE))
return -EBUSY;
 #endif
+
+   if (!ftrace_ops_test(_ops, func, NULL))
+   return -EBUSY;
+
trace.func = func;
trace.depth = ++current->curr_ret_depth;
 
@@ -664,7 +679,6 @@ extern void ftrace_stub_graph(struct ftrace_graph_ret *);
 /* The callbacks that hook a function */
 trace_func_graph_ret_t ftrace_graph_return = ftrace_stub_graph;
 trace_func_graph_ent_t ftrace_graph_entry = ftrace_graph_entry_stub;
-static trace_func_graph_ent_t __ftrace_graph_entry = ftrace_graph_entry_stub;
 
 /* Try to assign a return stack array on FTRACE_RETSTACK_ALLOC_SIZE tasks. */
 static int alloc_retstack_tasklist(unsigned long **ret_stack_list)
@@ -747,46 +761,6 @@ ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
}
 }
 
-static int ftrace_graph_entry_test(struct ftrace_graph_ent *trace)
-{
-   if (!ftrace_ops_test(_ops, trace->func, NULL))
-   return 0;
-   return __ftrace_graph_entry(trace);
-}
-
-/*
- * The function graph tracer should only trace the functions defined
- * by set_ftrace_filter and set_ftrace_notrace. If another function
- * tracer ops is registered, the graph tracer requires testing the
- * function against the global ops, and not just trace any function
- * that any ftrace_ops registered.
- */
-void update_function_graph_func(void)
-{
-   struct ftrace_ops *op;
-   bool do_test = false;
-
-   /*
-* The graph and global ops share the same set of functions
-* to test. If any other ops is on the list, then
-* the graph tracing needs to test if its the function
-* it should call.
-*/
-   do_for_each_ftrace_op(op, ftrace_ops_list) {
-   if (op != _ops && op != _ops &&
-   op != _list_end) {
-   do_test = true;
-   /* in double loop, break out with goto */
-   goto out;
-   }
-   } while_for_each_ftrace_op(op);
- out:
-   if (do_test)
-   ftrace_graph_entry = ftrace_graph_entry_test;
-   else
-   ftrace_graph_entry = __ftrace_graph_entry;
-}
-
 static DEFINE_PER_CPU(unsigned long *, idle_ret_stack);
 
 static void
@@ -927,18 +901,12 @@ int register_ftrace_graph(struct fgraph_ops *gops)
ftrace_graph_active--;
goto out;
}
-
-   ftrace_graph_return = gops->retfunc;
-
/*
-* Update the indirect function to the entryfunc, and the
-* function that gets called to the entry_test first. Then
-* call the update fgraph entry function to determine if
-* the entryfunc should be called directly or not.
+

[PATCH v4 06/33] function_graph: Allow multiple users to attach to function graph

2023-12-08 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Allow for multiple users to attach to function graph tracer at the same
time. Only 16 simultaneous users can attach to the tracer. This is because
there's an array that stores the pointers to the attached fgraph_ops. When
a function being traced is entered, each of the ftrace_ops entryfunc is
called and if it returns non zero, its index into the array will be added
to the shadow stack.

On exit of the function being traced, the shadow stack will contain the
indexes of the ftrace_ops on the array that want their retfunc to be
called.

Because a function may sleep for a long time (if a task sleeps itself),
the return of the function may be literally days later. If the ftrace_ops
is removed, its place on the array is replaced with a ftrace_ops that
contains the stub functions and that will be called when the function
finally returns.

If another ftrace_ops is added that happens to get the same index into the
array, its return function may be called. But that's actually the way
things current work with the old function graph tracer. If one tracer is
removed and another is added, the new one will get the return calls of the
function traced by the previous one, thus this is not a regression. This
can be fixed by adding a counter to each time the array item is updated and
save that on the shadow stack as well, such that it won't be called if the
index saved does not match the index on the array.

Note, being able to filter functions when both are called is not completely
handled yet, but that shouldn't be too hard to manage.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v2:
  - Check return value of the ftrace_pop_return_trace() instead of 'ret'
since 'ret' is set to the address of panic().
  - Fix typo and make lines shorter than 76 chars in description.
---
 kernel/trace/fgraph.c |  332 +
 1 file changed, 280 insertions(+), 52 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 86df3ca6964f..8aba93be11b2 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -27,23 +27,144 @@
 
 #define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
 #define FGRAPH_RET_INDEX (FGRAPH_RET_SIZE / sizeof(long))
+
+/*
+ * On entry to a function (via function_graph_enter()), a new ftrace_ret_stack
+ * is allocated on the task's ret_stack, then each fgraph_ops on the
+ * fgraph_array[]'s entryfunc is called and if that returns non-zero, the
+ * index into the fgraph_array[] for that fgraph_ops is added to the ret_stack.
+ * As the associated ftrace_ret_stack saved for those fgraph_ops needs to
+ * be found, the index to it is also added to the ret_stack along with the
+ * index of the fgraph_array[] to each fgraph_ops that needs their retfunc
+ * called.
+ *
+ * The top of the ret_stack (when not empty) will always have a reference
+ * to the last ftrace_ret_stack saved. All references to the
+ * ftrace_ret_stack has the format of:
+ *
+ * bits:  0 - 13   Index in words from the previous ftrace_ret_stack
+ * bits: 14 - 15   Type of storage
+ *   0 - reserved
+ *   1 - fgraph_array index
+ * For fgraph_array_index:
+ *  bits: 16 - 23  The fgraph_ops fgraph_array index
+ *
+ * That is, at the end of function_graph_enter, if the first and forth
+ * fgraph_ops on the fgraph_array[] (index 0 and 3) needs their retfunc called
+ * on the return of the function being traced, this is what will be on the
+ * task's shadow ret_stack: (the stack grows upward)
+ *
+ * |  | <- task->curr_ret_stack
+ * +--+
+ * | (3 << FGRAPH_ARRAY_SHIFT)|(2)| ( 3 for index of fourth fgraph_ops)
+ * +--+
+ * | (0 << FGRAPH_ARRAY_SHIFT)|(1)| ( 0 for index of first fgraph_ops)
+ * +--+
+ * | struct ftrace_ret_stack  |
+ * |   (stores the saved ret pointer) |
+ * +--+
+ * | (X) | (N)| ( N words away from previous ret_stack)
+ * |  |
+ *
+ * If a backtrace is required, and the real return pointer needs to be
+ * fetched, then it looks at the task's curr_ret_stack index, if it
+ * is greater than zero, it would subtact one, and then mask the value
+ * on the ret_stack by FGRAPH_RET_INDEX_MASK and subtract FGRAPH_RET_INDEX
+ * from that, to get the index of the ftrace_ret_stack structure stored
+ * on the shadow stack.
+ */
+
+#define FGRAPH_RET_INDEX_SIZE  14
+#define FGRAPH_RET_INDEX_MASK  ((1 << FGRAPH_RET_INDEX_SIZE) - 1)
+
+
+#define FGRAPH_TYPE_SIZE   2
+#define FGRAPH_TYPE_MASK   ((1 << FGRAPH_TYPE_SIZE) - 1)
+#define FGRAPH_TYPE_SHIFT  FGRAPH_RET_INDEX_SIZE
+
+enum {
+   FGRAPH_TYPE_RESERVED= 0,
+   FGRAPH_TYPE_ARRAY   = 1,
+};
+
+#define FGRAPH_ARRAY_SIZE  16
+#define

[PATCH v4 05/33] function_graph: Add an array structure that will allow multiple callbacks

2023-12-08 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Add an array structure that will eventually allow the function graph tracer
to have up to 16 simultaneous callbacks attached. It's an array of 16
fgraph_ops pointers, that is assigned when one is registered. On entry of a
function the entry of the first item in the array is called, and if it
returns zero, then the callback returns non zero if it wants the return
callback to be called on exit of the function.

The array will simplify the process of having more than one callback
attached to the same function, as its index into the array can be stored on
the shadow stack. We need to only save the index, because this will allow
the fgraph_ops to be freed before the function returns (which may happen if
the function call schedule for a long time).

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v2:
  - Remove unneeded brace.
---
 kernel/trace/fgraph.c |  114 +++--
 1 file changed, 81 insertions(+), 33 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 837daf929d2a..86df3ca6964f 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -39,6 +39,11 @@
 DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
 int ftrace_graph_active;
 
+static int fgraph_array_cnt;
+#define FGRAPH_ARRAY_SIZE  16
+
+static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
+
 /* Both enabled by default (can be cleared by function_graph tracer flags */
 static bool fgraph_sleep_time = true;
 
@@ -62,6 +67,20 @@ int __weak ftrace_disable_ftrace_graph_caller(void)
 }
 #endif
 
+int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
+{
+   return 0;
+}
+
+static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace)
+{
+}
+
+static struct fgraph_ops fgraph_stub = {
+   .entryfunc = ftrace_graph_entry_stub,
+   .retfunc = ftrace_graph_ret_stub,
+};
+
 /**
  * ftrace_graph_stop - set to permanently disable function graph tracing
  *
@@ -159,7 +178,7 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
goto out;
 
/* Only trace if the calling function expects to */
-   if (!ftrace_graph_entry())
+   if (!fgraph_array[0]->entryfunc())
goto out_ret;
 
return 0;
@@ -274,7 +293,7 @@ static unsigned long __ftrace_return_to_handler(struct 
fgraph_ret_regs *ret_regs
trace.retval = fgraph_ret_regs_return_value(ret_regs);
 #endif
trace.rettime = trace_clock_local();
-   ftrace_graph_return();
+   fgraph_array[0]->retfunc();
/*
 * The ftrace_graph_return() may still access the current
 * ret_stack structure, we need to make sure the update of
@@ -410,11 +429,6 @@ void ftrace_graph_sleep_time_control(bool enable)
fgraph_sleep_time = enable;
 }
 
-int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
-{
-   return 0;
-}
-
 /*
  * Simply points to ftrace_stub, but with the proper protocol.
  * Defined by the linker script in linux/vmlinux.lds.h
@@ -652,37 +666,54 @@ static int start_graph_tracing(void)
 int register_ftrace_graph(struct fgraph_ops *gops)
 {
int ret = 0;
+   int i;
 
mutex_lock(_lock);
 
-   /* we currently allow only one tracer registered at a time */
-   if (ftrace_graph_active) {
+   if (!fgraph_array[0]) {
+   /* The array must always have real data on it */
+   for (i = 0; i < FGRAPH_ARRAY_SIZE; i++)
+   fgraph_array[i] = _stub;
+   }
+
+   /* Look for an available spot */
+   for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
+   if (fgraph_array[i] == _stub)
+   break;
+   }
+   if (i >= FGRAPH_ARRAY_SIZE) {
ret = -EBUSY;
goto out;
}
 
-   register_pm_notifier(_suspend_notifier);
+   fgraph_array[i] = gops;
+   if (i + 1 > fgraph_array_cnt)
+   fgraph_array_cnt = i + 1;
 
ftrace_graph_active++;
-   ret = start_graph_tracing();
-   if (ret) {
-   ftrace_graph_active--;
-   goto out;
-   }
 
-   ftrace_graph_return = gops->retfunc;
+   if (ftrace_graph_active == 1) {
+   register_pm_notifier(_suspend_notifier);
+   ret = start_graph_tracing();
+   if (ret) {
+   ftrace_graph_active--;
+   goto out;
+   }
+
+   ftrace_graph_return = gops->retfunc;
 
-   /*
-* Update the indirect function to the entryfunc, and the
-* function that gets called to the entry_test first. Then
-* call the update fgraph entry function to determine if
-* the entryfunc should be called directly or not.
-*/
-   __ftrace_graph_entry = gops->entryfunc;
-   ftrace_graph_entry = ftrace_graph_entry_test;
-   update_function_graph_func();
+   /*
+

[PATCH v4 04/33] fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by long

2023-12-08 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Instead of using "ALIGN()", use BUILD_BUG_ON() as the structures should
always be divisible by sizeof(long).

Link: 
http://lkml.kernel.org/r/2019052444.gi2...@hirez.programming.kicks-ass.net

Suggested-by: Peter Zijlstra 
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 kernel/trace/fgraph.c |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 30edeb6d4aa9..837daf929d2a 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -26,10 +26,9 @@
 #endif
 
 #define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
-#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
+#define FGRAPH_RET_INDEX (FGRAPH_RET_SIZE / sizeof(long))
 #define SHADOW_STACK_SIZE (PAGE_SIZE)
-#define SHADOW_STACK_INDEX \
-   (ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
+#define SHADOW_STACK_INDEX (SHADOW_STACK_SIZE / sizeof(long))
 /* Leave on a buffer at the end */
 #define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
 
@@ -91,6 +90,8 @@ ftrace_push_return_trace(unsigned long ret, unsigned long 
func,
if (!current->ret_stack)
return -EBUSY;
 
+   BUILD_BUG_ON(SHADOW_STACK_SIZE % sizeof(long));
+
/*
 * We must make sure the ret_stack is tested before we read
 * anything else.
@@ -325,6 +326,8 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int 
idx)
 {
int index = task->curr_ret_stack;
 
+   BUILD_BUG_ON(FGRAPH_RET_SIZE % sizeof(long));
+
index -= FGRAPH_RET_INDEX * (idx + 1);
if (index < 0)
return NULL;

[PATCH v4 03/33] function_graph: Convert ret_stack to a series of longs

2023-12-08 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

In order to make it possible to have multiple callbacks registered with the
function_graph tracer, the retstack needs to be converted from an array of
ftrace_ret_stack structures to an array of longs. This will allow to store
the list of callbacks on the stack for the return side of the functions.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 include/linux/sched.h |2 -
 kernel/trace/fgraph.c |  124 -
 2 files changed, 71 insertions(+), 55 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 292c31697248..4dab30f00211 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1390,7 +1390,7 @@ struct task_struct {
int curr_ret_depth;
 
/* Stack of return addresses for return function tracing: */
-   struct ftrace_ret_stack *ret_stack;
+   unsigned long   *ret_stack;
 
/* Timestamp for last schedule: */
unsigned long long  ftrace_timestamp;
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index c83c005e654e..30edeb6d4aa9 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -25,6 +25,18 @@
 #define ASSIGN_OPS_HASH(opsname, val)
 #endif
 
+#define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
+#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
+#define SHADOW_STACK_SIZE (PAGE_SIZE)
+#define SHADOW_STACK_INDEX \
+   (ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
+/* Leave on a buffer at the end */
+#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
+
+#define RET_STACK(t, index) ((struct ftrace_ret_stack 
*)(&(t)->ret_stack[index]))
+#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
+#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; })
+
 DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
 int ftrace_graph_active;
 
@@ -69,6 +81,7 @@ static int
 ftrace_push_return_trace(unsigned long ret, unsigned long func,
 unsigned long frame_pointer, unsigned long *retp)
 {
+   struct ftrace_ret_stack *ret_stack;
unsigned long long calltime;
int index;
 
@@ -85,23 +98,25 @@ ftrace_push_return_trace(unsigned long ret, unsigned long 
func,
smp_rmb();
 
/* The return trace stack is full */
-   if (current->curr_ret_stack == FTRACE_RETFUNC_DEPTH - 1) {
+   if (current->curr_ret_stack >= SHADOW_STACK_MAX_INDEX) {
atomic_inc(>trace_overrun);
return -EBUSY;
}
 
calltime = trace_clock_local();
 
-   index = ++current->curr_ret_stack;
+   index = current->curr_ret_stack;
+   RET_STACK_INC(current->curr_ret_stack);
+   ret_stack = RET_STACK(current, index);
barrier();
-   current->ret_stack[index].ret = ret;
-   current->ret_stack[index].func = func;
-   current->ret_stack[index].calltime = calltime;
+   ret_stack->ret = ret;
+   ret_stack->func = func;
+   ret_stack->calltime = calltime;
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
-   current->ret_stack[index].fp = frame_pointer;
+   ret_stack->fp = frame_pointer;
 #endif
 #ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
-   current->ret_stack[index].retp = retp;
+   ret_stack->retp = retp;
 #endif
return 0;
 }
@@ -148,7 +163,7 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
 
return 0;
  out_ret:
-   current->curr_ret_stack--;
+   RET_STACK_DEC(current->curr_ret_stack);
  out:
current->curr_ret_depth--;
return -EBUSY;
@@ -159,11 +174,13 @@ static void
 ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
unsigned long frame_pointer)
 {
+   struct ftrace_ret_stack *ret_stack;
int index;
 
index = current->curr_ret_stack;
+   RET_STACK_DEC(index);
 
-   if (unlikely(index < 0 || index >= FTRACE_RETFUNC_DEPTH)) {
+   if (unlikely(index < 0 || index > SHADOW_STACK_MAX_INDEX)) {
ftrace_graph_stop();
WARN_ON(1);
/* Might as well panic, otherwise we have no where to go */
@@ -171,6 +188,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, 
unsigned long *ret,
return;
}
 
+   ret_stack = RET_STACK(current, index);
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
/*
 * The arch may choose to record the frame pointer used
@@ -186,22 +204,22 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, 
unsigned long *ret,
 * Note, -mfentry does not use frame pointers, and this test
 *  is not needed if CC_USING_FENTRY is set.
 */
-   if (unlikely(current->ret_stack[index].fp != frame_pointer)) {
+   if (unlikely(ret_stack->fp != frame_pointer)) {
ftrace_graph_stop();
WARN(1, "Bad frame

[PATCH v4 02/33] x86: tracing: Add ftrace_regs definition in the header

2023-12-08 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Add ftrace_regs definition for x86_64 in the ftrace header to
clarify what register will be accessible from ftrace_regs.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v3:
  - Add rip to be saved.
 Changes in v2:
  - Newly added.
---
 arch/x86/include/asm/ftrace.h |6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 897cf02c20b1..415cf7a2ec2c 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -36,6 +36,12 @@ static inline unsigned long ftrace_call_adjust(unsigned long 
addr)
 
 #ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
 struct ftrace_regs {
+   /*
+* On the x86_64, the ftrace_regs saves;
+* rax, rcx, rdx, rdi, rsi, r8, r9, rbp, rip and rsp.
+* Also orig_ax is used for passing direct trampoline address.
+* x86_32 doesn't support ftrace_regs.
+*/
struct pt_regs  regs;
 };

[PATCH v4 01/33] tracing: Add a comment about ftrace_regs definition

2023-12-08 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

To clarify what will be expected on ftrace_regs, add a comment to the
architecture independent definition of the ftrace_regs.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v3:
  - Add instruction pointer
 Changes in v2:
  - newly added.
---
 include/linux/ftrace.h |   26 ++
 1 file changed, 26 insertions(+)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index e8921871ef9a..8b48fc621ea0 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -118,6 +118,32 @@ extern int ftrace_enabled;
 
 #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
 
+/**
+ * ftrace_regs - ftrace partial/optimal register set
+ *
+ * ftrace_regs represents a group of registers which is used at the
+ * function entry and exit. There are three types of registers.
+ *
+ * - Registers for passing the parameters to callee, including the stack
+ *   pointer. (e.g. rcx, rdx, rdi, rsi, r8, r9 and rsp on x86_64)
+ * - Registers for passing the return values to caller.
+ *   (e.g. rax and rdx on x86_64)
+ * - Registers for hooking the function call and return including the
+ *   frame pointer (the frame pointer is architecture/config dependent)
+ *   (e.g. rip, rbp and rsp for x86_64)
+ *
+ * Also, architecture dependent fields can be used for internal process.
+ * (e.g. orig_ax on x86_64)
+ *
+ * On the function entry, those registers will be restored except for
+ * the stack pointer, so that user can change the function parameters
+ * and instruction pointer (e.g. live patching.)
+ * On the function exit, only registers which is used for return values
+ * are restored.
+ *
+ * NOTE: user *must not* access regs directly, only do it via APIs, because
+ * the member can be changed according to the architecture.
+ */
 struct ftrace_regs {
struct pt_regs  regs;
 };

[PATCH v4 00/33] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

2023-12-08 Thread Masami Hiramatsu (Google)

Hi,

Here is the 4th version of the series to re-implement the fprobe on
function-graph tracer. The previous version is;

https://lore.kernel.org/all/170109317214.343914.4784420430328654397.stgit@devnote2/

This version is rebased on "trace-v6.7-rc4" tag on linux-trace tree
and fix some issues on the new multiple function graph tracer
implementation[11/33].  I also add a simple LRU so that the recently 
released fgraph_array index is not reused soon [12/33].


Overview

This series does major 2 changes, enable multiple function-graphs on
the ftrace (e.g. allow function-graph on sub instances) and rewrite the
fprobe on this function-graph.

The former changes had been sent from Steven Rostedt 4 years ago (*),
which allows users to set different setting function-graph tracer (and
other tracers based on function-graph) in each trace-instances at the
same time.

(*) https://lore.kernel.org/all/20190525031633.811342...@goodmis.org/

The purpose of latter change are;

 1) Remove dependency of the rethook from fprobe so that we can reduce
   the return hook code and shadow stack.

 2) Make 'ftrace_regs' the common trace interface for the function
   boundary.

1) Currently we have 2(or 3) different function return hook codes,
 the function-graph tracer and rethook (and legacy kretprobe).
 But since this  is redundant and needs double maintenance cost,
 I would like to unify those. From the user's viewpoint, function-
 graph tracer is very useful to grasp the execution path. For this
 purpose, it is hard to use the rethook in the function-graph
 tracer, but the opposite is possible. (Strictly speaking, kretprobe
 can not use it because it requires 'pt_regs' for historical reasons.)

2) Now the fprobe provides the 'pt_regs' for its handler, but that is
 wrong for the function entry and exit. Moreover, depending on the
 architecture, there is no way to accurately reproduce 'pt_regs'
 outside of interrupt or exception handlers. This means fprobe should
 not use 'pt_regs' because it does not use such exceptions.
 (Conversely, kprobe should use 'pt_regs' because it is an abstract
  interface of the software breakpoint exception.)

This series changes fprobe to use function-graph tracer for tracing
function entry and exit, instead of mixture of ftrace and rethook.
Unlike the rethook which is a per-task list of system-wide allocated
nodes, the function graph's ret_stack is a per-task shadow stack.
Thus it does not need to set 'nr_maxactive' (which is the number of
pre-allocated nodes).
Also the handlers will get the 'ftrace_regs' instead of 'pt_regs'.
Since eBPF mulit_kprobe/multi_kretprobe events still use 'pt_regs' as
their register interface, this changes it to convert 'ftrace_regs' to
'pt_regs'. Of course this conversion makes an incomplete 'pt_regs',
so users must access only registers for function parameters or
return value. 

Design
--
Instead of using ftrace's function entry hook directly, the new fprobe
is built on top of the function-graph's entry and return callbacks
with 'ftrace_regs'.

Since the fprobe requires access to 'ftrace_regs', the architecture
must support CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS, which enables to
call function-graph entry callback with 'ftrace_regs', and also
CONFIG_HAVE_FUNCTION_GRAPH_FREGS, which passes the ftrace_regs to
return_to_handler.

All fprobes share a single function-graph ops (means shares a common
ftrace filter) similar to the kprobe-on-ftrace. This needs another
layer to find corresponding fprobe in the common function-graph
callbacks, but has much better scalability, since the number of
registered function-graph ops is limited.

In the entry callback, the fprobe runs its entry_handler and saves the
address of 'fprobe' on the function-graph's shadow stack as data. The
return callback decodes the data to get the 'fprobe' address, and runs
the exit_handler.

The fprobe introduces two hash-tables, one is for entry callback which
searches fprobes related to the given function address passed by entry
callback. The other is for a return callback which checks if the given
'fprobe' data structure pointer is still valid. Note that it is
possible to unregister fprobe before the return callback runs. Thus
the address validation must be done before using it in the return
callback.

Series
--
- Patch [1/33] and [2/33] are adding a comment for ftrace_regs.
- Patch [3/33] to [18/33] are the multiple function-graph support.
- Patch [19/33] and [20/33] adds new function-graph callbacks with
  ftrace_regs and x86-64 implementation.
- Patch [21/33] to [27/33] are preparation (adding util functions) of
  the new fprobe and its user.
- Patch [28/33] to [32/33] rewrites fprobes and updates its users.
- Patch [33/33] is a documentation update.

This series can be applied against the trace-v6.7-rc4 on linux-trace tree.

This series can also be found below branch.

https://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git/log/?h=topic/fprobe-on-fgraph

Thank you,

---

Re: Re: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)

2023-12-08 Thread Tobias Huschle

On Thu, Dec 07, 2023 at 01:48:40AM -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 07, 2023 at 07:22:12AM +0100, Tobias Huschle wrote:
> > 3. vhost looping endlessly, waiting for kworker to be scheduled
> > 
> > I dug a little deeper on what the vhost is doing. I'm not an expert on
> > virtio whatsoever, so these are just educated guesses that maybe
> > someone can verify/correct. Please bear with me probably messing up 
> > the terminology.
> > 
> > - vhost is looping through available queues.
> > - vhost wants to wake up a kworker to process a found queue.
> > - kworker does something with that queue and terminates quickly.
> > 
> > What I found by throwing in some very noisy trace statements was that,
> > if the kworker is not woken up, the vhost just keeps looping accross
> > all available queues (and seems to repeat itself). So it essentially
> > relies on the scheduler to schedule the kworker fast enough. Otherwise
> > it will just keep on looping until it is migrated off the CPU.
> 
> 
> Normally it takes the buffers off the queue and is done with it.
> I am guessing that at the same time guest is running on some other
> CPU and keeps adding available buffers?
> 

It seems to do just that, there are multiple other vhost instances
involved which might keep filling up thoses queues. 

Unfortunately, this makes the problematic vhost instance to stay on
the CPU and prevents said kworker to get scheduled. The kworker is
explicitly woken up by vhost, so it wants it to do something.

At this point it seems that there is an assumption about the scheduler
in place which is no longer fulfilled by EEVDF. From the discussion so
far, it seems like EEVDF does what is intended to do.

Shouldn't there be a more explicit mechanism in use that allows the
kworker to be scheduled in favor of the vhost?

It is also concerning that the vhost seems cannot be preempted by the
scheduler while executing that loop.

Re: [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior

2023-12-08 Thread Zhijian Li (Fujitsu)



On 08/12/2023 03:25, Verma, Vishal L wrote:
> On Thu, 2023-12-07 at 08:29 +, Zhijian Li (Fujitsu) wrote:
>> Hi Vishal,
>>
>>
>> On 07/12/2023 12:36, Vishal Verma wrote:
>>> +
>>> +What:  /sys/bus/dax/devices/daxX.Y/memmap_on_memory
>>> +Date:  October, 2023
>>> +KernelVersion: v6.8
>>> +Contact:   nvd...@lists.linux.dev
>>> +Description:
>>> +   (RW) Control the memmap_on_memory setting if the dax device
>>> +   were to be hotplugged as system memory. This determines 
>>> whether
>>> +   the 'altmap' for the hotplugged memory will be placed on the
>>> +   device being hotplugged (memmap_on+memory=1) or if it will 
>>> be
>>
>> s/memmap_on+memory=1/memmap_on_memory=1
> 
> Thanks, will fix.
>>
>>
>> I have a question here
>>
>> What relationship about memmap_on_memory and 'ndctl-create-namespace
>> -M' option which
>> specifies where is the vmemmap backed memory.
>> I'm confused that memmap_on_memory=1 and '-M dev' are the same for
>> nvdimm devdax mode ?
>>
> The main difference is that memory that comes from non-nvdimm sources,
> such as hmem, or cxl, doesn't have a chance to set up the altmaps as
> pmem can with '-M dev'. For these, memmap_on_memory does this as part
> of the memory hotplug.

Thanks for your explanation.
(I wrongly thought nvdimm.kmem was also controlled by '-M dev' before)

feel free add:
Tested-by: Li Zhijian

Re: [PATCH v10 3/3] dax/kmem: allow kmem to add memory with memmap_on_memory

2023-12-08 Thread Zhijian Li (Fujitsu)



On 07/11/2023 15:22, Vishal Verma wrote:
> Large amounts of memory managed by the kmem driver may come in via CXL,
> and it is often desirable to have the memmap for this memory on the new
> memory itself.
> 
> Enroll kmem-managed memory for memmap_on_memory semantics if the dax
> region originates via CXL. For non-CXL dax regions, retain the existing
> default behavior of hot adding without memmap_on_memory semantics.
> 
> Cc: Andrew Morton 
> Cc: David Hildenbrand 
> Cc: Michal Hocko 
> Cc: Oscar Salvador 
> Cc: Dan Williams 
> Cc: Dave Jiang 
> Cc: Dave Hansen 
> Cc: Huang Ying 
> Reviewed-by: Jonathan Cameron 
> Reviewed-by: David Hildenbrand 
> Reviewed-by: "Huang, Ying" 
> Signed-off-by: Vishal Verma 


Tested-by: Li Zhijian   # both cxl.kmem and nvdimm.kmem




> ---
>   drivers/dax/bus.h | 1 +
>   drivers/dax/dax-private.h | 1 +
>   drivers/dax/bus.c | 3 +++
>   drivers/dax/cxl.c | 1 +
>   drivers/dax/hmem/hmem.c   | 1 +
>   drivers/dax/kmem.c| 8 +++-
>   drivers/dax/pmem.c| 1 +
>   7 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> index 1ccd23360124..cbbf64443098 100644
> --- a/drivers/dax/bus.h
> +++ b/drivers/dax/bus.h
> @@ -23,6 +23,7 @@ struct dev_dax_data {
>   struct dev_pagemap *pgmap;
>   resource_size_t size;
>   int id;
> + bool memmap_on_memory;
>   };
>   
>   struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data);
> diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> index 27cf2d79..446617b73aea 100644
> --- a/drivers/dax/dax-private.h
> +++ b/drivers/dax/dax-private.h
> @@ -70,6 +70,7 @@ struct dev_dax {
>   struct ida ida;
>   struct device dev;
>   struct dev_pagemap *pgmap;
> + bool memmap_on_memory;
>   int nr_range;
>   struct dev_dax_range {
>   unsigned long pgoff;
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 1659b787b65f..1ff1ab5fa105 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -367,6 +367,7 @@ static ssize_t create_store(struct device *dev, struct 
> device_attribute *attr,
>   .dax_region = dax_region,
>   .size = 0,
>   .id = -1,
> + .memmap_on_memory = false,
>   };
>   struct dev_dax *dev_dax = devm_create_dev_dax();
>   
> @@ -1400,6 +1401,8 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data 
> *data)
>   dev_dax->align = dax_region->align;
>   ida_init(_dax->ida);
>   
> + dev_dax->memmap_on_memory = data->memmap_on_memory;
> +
>   inode = dax_inode(dax_dev);
>   dev->devt = inode->i_rdev;
>   dev->bus = _bus_type;
> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> index 8bc9d04034d6..c696837ab23c 100644
> --- a/drivers/dax/cxl.c
> +++ b/drivers/dax/cxl.c
> @@ -26,6 +26,7 @@ static int cxl_dax_region_probe(struct device *dev)
>   .dax_region = dax_region,
>   .id = -1,
>   .size = range_len(_dax->hpa_range),
> + .memmap_on_memory = true,
>   };
>   
>   return PTR_ERR_OR_ZERO(devm_create_dev_dax());
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 5d2ddef0f8f5..b9da69f92697 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -36,6 +36,7 @@ static int dax_hmem_probe(struct platform_device *pdev)
>   .dax_region = dax_region,
>   .id = -1,
>   .size = region_idle ? 0 : range_len(>range),
> + .memmap_on_memory = false,
>   };
>   
>   return PTR_ERR_OR_ZERO(devm_create_dev_dax());
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 369c698b7706..42ee360cf4e3 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -12,6 +12,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
>   #include "dax-private.h"
>   #include "bus.h"
>   
> @@ -93,6 +94,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>   struct dax_kmem_data *data;
>   struct memory_dev_type *mtype;
>   int i, rc, mapped = 0;
> + mhp_t mhp_flags;
>   int numa_node;
>   int adist = MEMTIER_DEFAULT_DAX_ADISTANCE;
>   
> @@ -179,12 +181,16 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>*/
>   res->flags = IORESOURCE_SYSTEM_RAM;
>   
> + mhp_flags = MHP_NID_IS_MGID;
> + if (dev_dax->memmap_on_memory)
> + mhp_flags |= MHP_MEMMAP_ON_MEMORY;
> +
>   /*
>* Ensure that future kexec'd kernels will not treat
>* this as RAM automatically.
>*/
>   rc = add_memory_driver_managed(data->mgid, range.start,
> - range_len(), kmem_name, MHP_NID_IS_MGID);
> + range_len(), kmem_name, mhp_flags);
>   
>   if (rc) {
>

94 matches

Mail list logo