Re: [linux-next:master] BUILD REGRESSION ee78a17615ad0cfdbbc27182b1047cd36c9d4d5f

2024-06-06 Thread Steven Rostedt


Is there something new since the last report?

 https://lore.kernel.org/all/202406060125.8ggeepjs-...@intel.com/

If not, can we please not keep spamming, as it makes it harder to know
what to fix and what has been fixed.

-- Steve


On Fri, 07 Jun 2024 01:15:11 +0800
kernel test robot  wrote:

> tree/branch: 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> branch HEAD: ee78a17615ad0cfdbbc27182b1047cd36c9d4d5f  Add linux-next 
> specific files for 20240606
> 
> Error/Warning reports:
> 
> https://lore.kernel.org/oe-kbuild-all/202406061744.rzdxfrrg-...@intel.com
> 
> Error/Warning: (recently discovered and may have been fixed)
> 
> kernel/trace/trace_selftest.c:761: warning: "BYTE_NUMBER" redefined
> 
> Unverified Error/Warning (likely false positive, please contact us if 
> interested):
> 
> arch/microblaze/boot/dts/system.dts:20.9-23.4: Warning (unit_address_vs_reg): 
> /memory: node has a reg or ranges property, but no unit name
> arch/microblaze/boot/dts/system.dts:272.4-19: Warning (clocks_property): 
> /amba_pl/dma@41e0:clocks: cell 0 is not a phandle reference
> arch/microblaze/boot/dts/system.dts:284.4-19: Warning (clocks_property): 
> /amba_pl/timer@41c0:clocks: cell 0 is not a phandle reference
> arch/microblaze/boot/dts/system.dts:339.4-19: Warning (clocks_property): 
> /amba_pl/i2c@4080:clocks: cell 0 is not a phandle reference
> arch/microblaze/boot/dts/system.dts:483.25-486.6: Warning 
> (unit_address_format): /amba_pl/flash@6000/partition@0x: unit 
> name should not have leading "0x"
> arch/microblaze/boot/dts/system.dts:483.25-486.6: Warning 
> (unit_address_format): /amba_pl/flash@6000/partition@0x: unit 
> name should not have leading 0s
> arch/microblaze/boot/dts/system.dts:50.4-19: Warning (clocks_property): 
> /cpus/cpu@0:clocks: cell 0 is not a phandle reference
> arch/microblaze/boot/dts/system.dts:560.4-19: Warning (clocks_property): 
> /amba_pl/serial@44a0:clocks: cell 0 is not a phandle reference
> arch/microblaze/boot/dts/system.dts:579.15-588.4: Warning (simple_bus_reg): 
> /amba_pl/gpio-restart: missing or empty reg/ranges property
> 
> Error/Warning ids grouped by kconfigs:
> 
> gcc_recent_errors
> |-- alpha-randconfig-r123-20240606
> |   `-- drivers-hwmon-cros_ec_hwmon.c:sparse:sparse:cast-to-restricted-__le16
> |-- loongarch-defconfig
> |   |-- 
> drivers-gpu-drm-amd-amdgpu-..-display-dc-hubbub-dcn401-dcn401_hubbub.o:warning:objtool:unexpected-relocation-symbol-type-in-.rela.discard.reachable
> |   `-- 
> drivers-thermal-thermal_trip.o:warning:objtool:unexpected-relocation-symbol-type-in-.rela.discard.reachable
> |-- microblaze-buildonly-randconfig-r001-20230308
> |   |-- 
> arch-microblaze-boot-dts-system.dts.-.:Warning-(simple_bus_reg):amba_pl-gpio-restart:missing-or-empty-reg-ranges-property
> |   |-- 
> arch-microblaze-boot-dts-system.dts.-.:Warning-(unit_address_format):amba_pl-flash-partition:unit-name-should-not-have-leading
> |   |-- 
> arch-microblaze-boot-dts-system.dts.-.:Warning-(unit_address_format):amba_pl-flash-partition:unit-name-should-not-have-leading-0s
> |   |-- 
> arch-microblaze-boot-dts-system.dts.-.:Warning-(unit_address_vs_reg):memory:node-has-a-reg-or-ranges-property-but-no-unit-name
> |   |-- 
> arch-microblaze-boot-dts-system.dts.:Warning-(clocks_property):amba_pl-dma-41e0:clocks:cell-is-not-a-phandle-reference
> |   |-- 
> arch-microblaze-boot-dts-system.dts.:Warning-(clocks_property):amba_pl-i2c:clocks:cell-is-not-a-phandle-reference
> |   |-- 
> arch-microblaze-boot-dts-system.dts.:Warning-(clocks_property):amba_pl-serial-44a0:clocks:cell-is-not-a-phandle-reference
> |   |-- 
> arch-microblaze-boot-dts-system.dts.:Warning-(clocks_property):amba_pl-timer-41c0:clocks:cell-is-not-a-phandle-reference
> |   `-- 
> arch-microblaze-boot-dts-system.dts.:Warning-(clocks_property):cpus-cpu:clocks:cell-is-not-a-phandle-reference
> |-- openrisc-randconfig-r121-20240606
> |   `-- 
> drivers-clk-qcom-camcc-sm7150.c:sparse:sparse:symbol-camcc_sm7150_hws-was-not-declared.-Should-it-be-static
> |-- sh-randconfig-c004-20211223
> |   `-- kernel-trace-trace_selftest.c:warning:BYTE_NUMBER-redefined
> |-- sh-randconfig-r111-20240606
> |   |-- 
> include-linux-container_of.h:error:struct-ftrace_ops-has-no-member-named-list
> |   |-- include-linux-list.h:error:struct-ftrace_ops-has-no-member-named-list
> |   |-- 
> include-linux-stddef.h:error:struct-ftrace_ops-has-no-member-named-list
> |   |-- 
> kernel-trace-fgraph.c:error:implicit-declaration-of-function-ftrace_shutdown_subops
> |   |-- 
> kernel-trace-fgraph.c:error:implicit-declaration-of-function-ftrace_startup_subops
> |   `-- 
> kernel-trace-fgraph.c:error:struct-ftrace_ops-has-no-member-named-subop_list
> |-- um-allyesconfig
> |   `-- 
> kernel-bpf-verifier.c:error:pcpu_hot-undeclared-(first-use-in-this-function)
> `-- x86_64-randconfig-161-20240606
> |-- 
> drivers-gpu-drm-amd-amdgpu-amdgpu_vm.c-amdgpu_vm_bo_update()-err

Re: [PATCH] tracing/treewide: Remove second parameter of __assign_str()

2024-05-19 Thread Steven Rostedt
On Fri, 17 May 2024 10:36:37 -0700
Guenter Roeck  wrote:

> Building csky:allmodconfig (and others) ... failed
> --
> Error log:
> In file included from include/trace/trace_events.h:419,
>  from include/trace/define_trace.h:102,
>  from drivers/cxl/core/trace.h:737,
>  from drivers/cxl/core/trace.c:8:
> drivers/cxl/core/./trace.h:383:1: error: macro "__assign_str" passed 2 
> arguments, but takes just 1
> 
> This is with the patch applied on top of v6.9-8410-gff2632d7d08e.
> So far that seems to be the only build failure.
> Introduced with commit 6aec00139d3a8 ("cxl/core: Add region info to
> cxl_general_media and cxl_dram events"). Guess we'll see more of those
> towards the end of the commit window.

Looks like I made this patch just before this commit was pulled into
Linus's tree.

Which is why I'll apply and rerun the above again probably on Tuesday of
next week against Linus's latest.

This patch made it through both an allyesconfig and an allmodconfig, but on
the commit I had applied it to, which was:

  1b294a1f3561 ("Merge tag 'net-next-6.10' of 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next")

I'll be compiling those two builds after I update it then.

-- Steve


Re: [FYI][PATCH] tracing/treewide: Remove second parameter of __assign_str()

2024-03-15 Thread Steven Rostedt
On Thu, 14 Mar 2024 09:57:57 -0700
Alison Schofield  wrote:

> On Fri, Feb 23, 2024 at 12:56:34PM -0500, Steven Rostedt wrote:
> > From: "Steven Rostedt (Google)" 
> > 
> > [
> >This is a treewide change. I will likely re-create this patch again in
> >the second week of the merge window of v6.9 and submit it then. Hoping
> >to keep the conflicts that it will cause to a minimum.
> > ]

Note, change of plans. I plan on sending this in the next merge window, as
this merge window I have this patch:

  
https://lore.kernel.org/linux-trace-kernel/20240312113002.00031...@gandalf.local.home/

That will warn if the source string of __string() is different than the
source string of __assign_str(). I want to make sure they are identical
before just dropping one of them.


> 
> > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> > index bdf117a33744..07ba4e033347 100644
> > --- a/drivers/cxl/core/trace.h
> > +++ b/drivers/cxl/core/trace.h  
> 
> snip to poison
> 
> > @@ -668,8 +668,8 @@ TRACE_EVENT(cxl_poison,
> > ),
> >  
> > TP_fast_assign(
> > -   __assign_str(memdev, dev_name(&cxlmd->dev));
> > -   __assign_str(host, dev_name(cxlmd->dev.parent));
> > +   __assign_str(memdev);
> > +   __assign_str(host);  
> 
> I think I get that the above changes work because the TP_STRUCT__entry for
> these did:
>   __string(memdev, dev_name(&cxlmd->dev))
>   __string(host, dev_name(cxlmd->dev.parent))

That's the point. They have to be identical or you will likely bug.

The __string(name, src) is used to find the string length of src which
allocates the necessary length on the ring buffer. The __assign_str(name, src)
will copy src into the ring buffer.

Similar to:

len = strlen(src);
buf = malloc(len);
strcpy(buf, str);

Where __string() is strlen() and __assign_str() is strcpy(). It doesn't
make sense to use two different strings, and if you did, it would likely be
a bug.

But the magic behind __string() does much more than just get the length of
the string, and it could easily save the pointer to the string (along with
its length) and have it copy that in the __assign_str() call, making the
src parameter of __assign_str() useless.

> 
> > __entry->serial = cxlmd->cxlds->serial;
> > __entry->overflow_ts = cxl_poison_overflow(flags, overflow_ts);
> > __entry->dpa = cxl_poison_record_dpa(record);
> > @@ -678,12 +678,12 @@ TRACE_EVENT(cxl_poison,
> > __entry->trace_type = trace_type;
> > __entry->flags = flags;
> > if (region) {
> > -   __assign_str(region, dev_name(®ion->dev));
> > +   __assign_str(region);
> > memcpy(__entry->uuid, ®ion->params.uuid, 16);
> > __entry->hpa = cxl_trace_hpa(region, cxlmd,
> >  __entry->dpa);
> > } else {
> > -   __assign_str(region, "");
> > +   __assign_str(region);
> > memset(__entry->uuid, 0, 16);
> > __entry->hpa = ULLONG_MAX;  
> 
> For the above 2, there was no helper in TP_STRUCT__entry. A recently
> posted patch is fixing that up to be __string(region, NULL) See [1],
> with the actual assignment still happening in TP_fast_assign.

__string(region, NULL) doesn't make sense. It's like:

len = strlen(NULL);
buf = malloc(len);
strcpy(buf, NULL);

??

I'll reply to that email.

-- Steve

> 
> Does that assign logic need to move to the TP_STRUCT__entry definition
> when you merge these changes? I'm not clear how much logic is able to be
> included, ie like 'C' style code in the TP_STRUCT__entry.
> 
> [1]
> https://lore.kernel.org/linux-cxl/20240314044301.2108650-1-alison.schofi...@intel.com/


Re: [FYI][PATCH] tracing/treewide: Remove second parameter of __assign_str()

2024-02-26 Thread Steven Rostedt
On Fri, 23 Feb 2024 14:50:49 -0500
Kent Overstreet  wrote:

> Tangentially related though, what would make me really happy is if we
> could create the string with in the TP__fast_assign() section. I have to
> have a bunch of annoying wrappers right now because the string length
> has to be known when we invoke the tracepoint.

You can use __string_len() to determine the string length in the tracepoint
(which is executed in the TP_fast_assign() section).

My clean up patches will make __assign_str_len() obsolete too (I'm working
on them now), and you can just use __assign_str().

I noticed that I don't have a string_len example in the sample code and I'm
actually writing it now.

// cutting out everything else:

TRACE_EVENT(foo_bar,

TP_PROTO(const char *foo, int bar),

TP_ARGS(foo, bar),

TP_STRUCT__entry(
__string_len(   lstr,   foo,bar < strlen(foo) ? bar : 
strlen(foo) )
),

TP_fast_assign(
__assign_str(lstr, foo);

// Note, the above is with my updates, without them, you need to duplicate the 
logic

//  __assign_str_len(lstr, foo, bar < strlen(foo) ? bar : 
strlen(foo));
),

TP_printk("%s", __get_str(lstr))
);


The above will allocate "bar < strlen(foo) ? bar : strlen(foo)" size on the
ring buffer. As the size is already stored, my clean up code uses that
instead of requiring duplicating the logic again.

-- Steve


Re: [FYI][PATCH] tracing/treewide: Remove second parameter of __assign_str()

2024-02-26 Thread Steven Rostedt
On Fri, 23 Feb 2024 13:46:53 -0500
Steven Rostedt  wrote:

> Now one thing I could do is to not remove the parameter, but just add:
> 
>   WARN_ON_ONCE((src) != __data_offsets->item##_ptr_);
> 
> in the __assign_str() macro to make sure that it's still the same that is
> assigned. But I'm not sure how useful that is, and still causes burden to
> have it. I never really liked the passing of the string in two places to
> begin with.

Hmm, maybe I'll just add this patch for 6.9 and then in 6.10 do the
parameter removal.

-- Steve

diff --git a/include/trace/stages/stage6_event_callback.h 
b/include/trace/stages/stage6_event_callback.h
index 0c0f50bcdc56..7372e2c2a0c4 100644
--- a/include/trace/stages/stage6_event_callback.h
+++ b/include/trace/stages/stage6_event_callback.h
@@ -35,6 +35,7 @@ #define __assign_str(dst, src)
do {\
char *__str__ = __get_str(dst); \
int __len__ = __get_dynamic_array_len(dst) - 1; \
+   WARN_ON_ONCE((src) != __data_offsets.dst##_ptr_);   \
memcpy(__str__, __data_offsets.dst##_ptr_ ? :   \
   EVENT_NULL_STR, __len__);\
__str__[__len__] = '\0';\


Re: [FYI][PATCH] tracing/treewide: Remove second parameter of __assign_str()

2024-02-26 Thread Steven Rostedt
On Fri, 23 Feb 2024 10:30:45 -0800
Jeff Johnson  wrote:

> On 2/23/2024 9:56 AM, Steven Rostedt wrote:
> > From: "Steven Rostedt (Google)" 
> > 
> > [
> >This is a treewide change. I will likely re-create this patch again in
> >the second week of the merge window of v6.9 and submit it then. Hoping
> >to keep the conflicts that it will cause to a minimum.
> > ]
> > 
> > With the rework of how the __string() handles dynamic strings where it
> > saves off the source string in field in the helper structure[1], the
> > assignment of that value to the trace event field is stored in the helper
> > value and does not need to be passed in again.  
> 
> Just curious if this could be done piecemeal by first changing the
> macros to be variadic macros which allows you to ignore the extra
> argument. The callers could then be modified in their separate trees.
> And then once all the callers have be merged, the macros could be
> changed to no longer be variadic.

I weighed doing that, but I think ripping off the band-aid is a better
approach. One thing I found is that leaving unused parameters in the macros
can cause bugs itself. I found one case doing my clean up, where an unused
parameter in one of the macros was bogus, and when I made it a used
parameter, it broke the build.

I think for tree-wide changes, the preferred approach is to do one big
patch at once. And since this only affects TRACE_EVENT() macros, it
hopefully would not be too much of a burden (although out of tree users may
suffer from this, but do we care?)

Now one thing I could do is to not remove the parameter, but just add:

WARN_ON_ONCE((src) != __data_offsets->item##_ptr_);

in the __assign_str() macro to make sure that it's still the same that is
assigned. But I'm not sure how useful that is, and still causes burden to
have it. I never really liked the passing of the string in two places to
begin with.

-- Steve


Re: [FYI][PATCH] tracing/treewide: Remove second parameter of __assign_str()

2024-02-26 Thread Steven Rostedt
On Fri, 23 Feb 2024 12:56:34 -0500
Steven Rostedt  wrote:

> Note, the same updates will need to be done for:
> 
>   __assign_str_len()
>   __assign_rel_str()
>   __assign_rel_str_len()

Correction: The below macros do not pass in their source to the entry
macros, so they will not need to be updated.

-- Steve

>   __assign_bitmask()
>   __assign_rel_bitmask()
>   __assign_cpumask()
>   __assign_rel_cpumask()



Re: [PATCH v2 6/6] drm: add drm_mode_atomic_commit event

2024-02-16 Thread Steven Rostedt
On Fri, 16 Feb 2024 17:37:23 +0100
Daniel Vetter  wrote:

> > > @@ -1503,6 +1504,24 @@ int drm_mode_atomic_ioctl(struct drm_device *dev,
> > >   drm_mode_object_put(obj);
> > >   }
> > >  
> > > + if (trace_drm_mode_atomic_commit_enabled()) {
> > > + struct drm_crtc_state *crtc_state;
> > > + struct drm_crtc *crtc;
> > > + int *crtcs;
> > > + int i, num_crtcs;
> > > +
> > > + crtcs = kcalloc(dev->mode_config.num_crtc, sizeof(int),
> > > + GFP_KERNEL);  
> > 
> > If the above allocation fails, this will cause a NULL kernel dereference.  
> 
> Yeah can't we somehow iterate directly into the trace subsystem? If
> nothing else works I guess just a per-crtc event should do.

You mean like this?

  https://lore.kernel.org/all/20240216105934.7b81e...@gandalf.local.home/

;-)

-- Steve



Re: [PATCH v3 6/8] drm: add drm_mode_atomic_commit event

2024-02-16 Thread Steven Rostedt
On Fri, 16 Feb 2024 16:09:55 +0100
Pierre-Eric Pelloux-Prayer  wrote:
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer 
> ---
>  drivers/gpu/drm/drm_atomic_uapi.c | 21 +
>  drivers/gpu/drm/drm_trace.h   | 23 +++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
> b/drivers/gpu/drm/drm_atomic_uapi.c
> index 29d4940188d4..f31b5c6f870b 100644
> --- a/drivers/gpu/drm/drm_atomic_uapi.c
> +++ b/drivers/gpu/drm/drm_atomic_uapi.c
> @@ -41,6 +41,7 @@
>  #include 
>  
>  #include "drm_crtc_internal.h"
> +#include "drm_trace.h"
>  
>  /**
>   * DOC: overview
> @@ -1503,6 +1504,26 @@ int drm_mode_atomic_ioctl(struct drm_device *dev,
>   drm_mode_object_put(obj);
>   }
>  
> + if (trace_drm_mode_atomic_commit_enabled()) {
> + struct drm_crtc_state *crtc_state;
> + struct drm_crtc *crtc;
> + int *crtcs;
> + int i, num_crtcs;
> +
> + crtcs = kcalloc(dev->mode_config.num_crtc, sizeof(int),
> + GFP_KERNEL);
> +
> + if (crtcs) {
> + num_crtcs = 0;
> + for_each_new_crtc_in_state(state, crtc, crtc_state, i)
> + crtcs[num_crtcs++] = drm_crtc_index(crtc);

Hmm, looking deeper into this, could you just do the loop the trace event?

That is how different is the config.num_crtc compared to the final num_crtcs?
That way, we don't need to do this allocation if it's not too different.
That is, pass in the dev->mode_config.num_crtc to the tracepoint instead of
num_crtcs.

> +
> + trace_drm_mode_atomic_commit(file_priv, crtcs, 
> num_crtcs, arg->flags);
> +
> + kfree(crtcs);
> + }
> + }
> +
>   ret = prepare_signaling(dev, state, arg, file_priv, &fence_state,
>   &num_fences);
>   if (ret)
> diff --git a/drivers/gpu/drm/drm_trace.h b/drivers/gpu/drm/drm_trace.h
> index 11c6dd577e8e..63489923c289 100644
> --- a/drivers/gpu/drm/drm_trace.h
> +++ b/drivers/gpu/drm/drm_trace.h
> @@ -66,6 +66,29 @@ TRACE_EVENT(drm_vblank_event_delivered,
> __entry->seq)
>  );
>  
> +TRACE_EVENT(drm_mode_atomic_commit,
> + TP_PROTO(struct drm_file *file, int *crtcs, int ncrtcs, uint32_t 
> flags),
> + TP_ARGS(file, crtcs, ncrtcs, flags),
> + TP_STRUCT__entry(
> + __field(struct drm_file *, file)
> + __dynamic_array(u32, crtcs, ncrtcs)

Here the ncrtcs is what is passed in. It will always be allocated to that
size though.

> + __field(uint32_t, ncrtcs)
> + __field(uint32_t, flags)
> + ),
> + TP_fast_assign(
> + unsigned int i;
> +
> + __entry->file = file;
> + for (i = 0; i < ncrtcs; i++)
> + ((u32 *)__get_dynamic_array(crtcs))[i] = crtcs[i];

Here we have:

int n = 0;

for_each_new_crtc_in_state(state, crtc, crtc_state, i)
((u32 *)__get_dynamic_array(crtcs))[n++] = 
drm_crtc_index(crtc);

__entry->ncrtcs = n;

But this is only viable if the ncrtcs is close to the same size as 
dev->mode_config.num_crtc,
otherwise it's not worth it.

-- Steve



> + __entry->ncrtcs = ncrtcs;
> + __entry->flags = flags;
> + ),
> + TP_printk("file=%p, pid=%8d, flags=%08x, crtcs=%s", __entry->file,
> +   pid_nr(__entry->file->pid), __entry->flags,
> +   __print_array(__get_dynamic_array(crtcs), 
> __entry->ncrtcs, 4))
> +);
> +
>  #endif /* _DRM_TRACE_H_ */
>  
>  /* This part must be outside protection */




Re: [PATCH v2 1/6] tracing, dma-buf: add a trace_dma_fence_sync_to event

2024-02-14 Thread Steven Rostedt
On Wed, 14 Feb 2024 13:00:16 +0100
Christian König  wrote:

> > +DEFINE_EVENT(dma_fence_from, dma_fence_sync_to,  
> 
> For a single event you should probably use TRACE_EVENT() instead of 
> declaring a class. A class is only used if you have multiple events with 
> the same parameters.

FYI, TRACE_EVENT() is actually defined as:

#define TRACE_EVENT(name, proto, args, tstruct, assign, print) \
DECLARE_EVENT_CLASS(name,  \
 PARAMS(proto),\
 PARAMS(args), \
 PARAMS(tstruct),  \
 PARAMS(assign),   \
 PARAMS(print));   \
DEFINE_EVENT(name, name, PARAMS(proto), PARAMS(args));

So basically, you could really just declare one TRACE_EVENT() and add
DEFINE_EVENT()s on top of it ;)

I never recommended that because I thought it would be confusing.

-- Steve



Re: [PATCH v2 6/6] drm: add drm_mode_atomic_commit event

2024-02-13 Thread Steven Rostedt
On Tue, 13 Feb 2024 16:50:31 +0100
Pierre-Eric Pelloux-Prayer  wrote:

> @@ -1503,6 +1504,24 @@ int drm_mode_atomic_ioctl(struct drm_device *dev,
>   drm_mode_object_put(obj);
>   }
>  
> + if (trace_drm_mode_atomic_commit_enabled()) {
> + struct drm_crtc_state *crtc_state;
> + struct drm_crtc *crtc;
> + int *crtcs;
> + int i, num_crtcs;
> +
> + crtcs = kcalloc(dev->mode_config.num_crtc, sizeof(int),
> + GFP_KERNEL);

If the above allocation fails, this will cause a NULL kernel dereference.

-- Steve

> +
> + num_crtcs = 0;
> + for_each_new_crtc_in_state(state, crtc, crtc_state, i)
> + crtcs[num_crtcs++] = drm_crtc_index(crtc);
> +
> + trace_drm_mode_atomic_commit(file_priv, crtcs, num_crtcs, 
> arg->flags);
> +
> + kfree(crtcs);
> + }
> +
>   ret = prepare_signaling(dev, state, arg, file_priv, &fence_state,
>   &num_fences);
>   if (ret)



Re: [linux-next:master] BUILD REGRESSION 2dac75696c6da3c848daa118a729827541c89d33

2023-10-18 Thread Steven Rostedt
On Thu, 19 Oct 2023 04:07:35 +0800
kernel test robot  wrote:

> Documentation/devicetree/bindings/mfd/qcom,tcsr.yaml:
> Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml:
> fs/tracefs/event_inode.c:782:11-21: ERROR: ei is NULL but dereferenced.

This was already reported and I'm currently testing a patch to fix it.

-- Steve



Re: radeon.ko/i586: BUG: kernel NULL pointer dereference,address:00000004

2023-07-24 Thread Steven Rostedt
On Sat, 22 Jul 2023 11:30:14 +0900
 wrote:

> >> diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> >> index 897cf02c20b1..801f4414da3e 100644
> >> --- a/arch/x86/include/asm/ftrace.h
> >> +++ b/arch/x86/include/asm/ftrace.h
> >> @@ -13,7 +13,7 @@
> >>  #ifdef CONFIG_HAVE_FENTRY
> >>  # include 
> >>  /* Add offset for endbr64 if IBT enabled */
> >> -# define FTRACE_MCOUNT_MAX_OFFSET ENDBR_INSN_SIZE
> >> +# define FTRACE_MCOUNT_MAX_OFFSET (ENDBR_INSN_SIZE + MCOUNT_INSN_SIZE)
> >>  #endif
> >>  
> >>  #ifdef CONFIG_DYNAMIC_FTRACE
> >>   
> 
> Above patch didn't work, but
> Does it matter that I am compiling with "gcc -fcf-protection=none"
> to not emit endbr32 instructions for i586?

This patch is supposed to address the case when ENDBR_INSN_SIZE is
zero. So I would think that that wouldn't matter.

-- Steve


Re: radeon.ko/i586: BUG: kernel NULL pointer dereference,address:00000004

2023-07-24 Thread Steven Rostedt
On Sun, 23 Jul 2023 20:55:06 +0900
 wrote:

> So I tried to trap NULL and return:
> 
>  patch-drm_vblank_cancel_pending_works-printk-NULL-ret.patch
> diff -up ./drivers/gpu/drm/drm_vblank_work.c.pk2 
> ./drivers/gpu/drm/drm_vblank_work.c
> --- ./drivers/gpu/drm/drm_vblank_work.c.pk2   2023-06-06 20:50:40.0 
> +0900
> +++ ./drivers/gpu/drm/drm_vblank_work.c   2023-07-23 14:29:56.383093673 
> +0900
> @@ -71,6 +71,10 @@ void drm_vblank_cancel_pending_works(str
>  {
>   struct drm_vblank_work *work, *next;
>  
> + if (!vblank->dev) {
> + printk(KERN_WARNING "%s: vblank->dev == NULL? returning\n", 
> __func__);
> + return;
> + }
>   assert_spin_locked(&vblank->dev->event_lock);
>  
>   list_for_each_entry_safe(work, next, &vblank->pending_work, node) {
> 
> 
> This time, the printk trap does not happen!! and radeon.ko works.
> (NULL check for vblank->worker is still fireing though)
> 
> Now this is puzzling.
> Is this a timing issue?

It could very well be. And the ftrace patch could possibly not be the
cause at all. But the thread that is created to do the work is causing
the race window to be opened up, which is why you see it with the patch
and don't without it. It may not be the problem, it may just tickle the
timings enough to trigger the bug, and is causing you to go on a wild
goose chase in the wrong direction.

-- Steve


> Is systemd-udevd doing something not favaorble to kernel?
> Is drm vblank code running without enough initialization?
> 
> Puzzling is, that purely useland activity
> (logging in on tty1 before radeon.ko load)
> is affecting kernel panic/no-panic.



Re: radeon.ko/i586: BUG: kernel NULL pointer dereference, address:00000004

2023-07-17 Thread Steven Rostedt
On Fri, 14 Jul 2023 14:34:04 +0900
 wrote:

> Patch in
> https://bugzilla.kernel.org/show_bug.cgi?id=217669#c4
> fixed the problem in freedesktop.org kernel 5.18.0-rc2 .
> This may explain that in kernel.org tree, the said commit is in kernel-5.19.

You mean the patch that adds:

#if defined(FTRACE_MCOUNT_MAX_OFFSET) && (FTRACE_MCOUNT_MAX_OFFSET)

?

Nothing should be setting FTRACE_MCOUNT_MAX_OFFSET to anything but non
zero. But doing a grep, I now see:

# define FTRACE_MCOUNT_MAX_OFFSET ENDBR_INSN_SIZE

Where it breaks that assumption if ENDBR_INSN_SIZE == 0 :-p
 (and that's my code!)

OK, does this fix it? (I haven't tested nor compiled this)

-- Steve

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 897cf02c20b1..801f4414da3e 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -13,7 +13,7 @@
 #ifdef CONFIG_HAVE_FENTRY
 # include 
 /* Add offset for endbr64 if IBT enabled */
-# define FTRACE_MCOUNT_MAX_OFFSET  ENDBR_INSN_SIZE
+# define FTRACE_MCOUNT_MAX_OFFSET  (ENDBR_INSN_SIZE + MCOUNT_INSN_SIZE)
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE


Re: radeon.ko/i586: BUG: kernel NULL pointer dereference,address:00000004

2023-07-17 Thread Steven Rostedt
On Sat, 15 Jul 2023 11:39:11 +0900
 wrote:


> Yes, this is puzzling. That's why I need other people's opinion on this.
> Does it matter the DUT is a slow machine (Pentium 120MHz)?
> 

Hmm, I wonder because the workqueue is running __init functions, could it
possibly be that it didn't finish before the end of boot, where it frees
all the functions? It shouldn't do that because there's code to make sure
it's done before it continues further.

But just in case something isn't working as planned, you could try this
patch to see if the bug goes away.

-- Steve

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 05c0024815bf..af5a40ef3be4 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3771,13 +3771,13 @@ static int test_for_valid_rec(struct dyn_ftrace *rec)
return 1;
 }
 
-static struct workqueue_struct *ftrace_check_wq __initdata;
-static struct work_struct ftrace_check_work __initdata;
+static struct workqueue_struct *ftrace_check_wq;
+static struct work_struct ftrace_check_work;
 
 /*
  * Scan all the mcount/fentry entries to make sure they are valid.
  */
-static __init void ftrace_check_work_func(struct work_struct *work)
+static void ftrace_check_work_func(struct work_struct *work)
 {
struct ftrace_page *pg;
struct dyn_ftrace *rec;


Re: radeon.ko/i586: BUG: kernel NULL pointer dereference, address:00000004

2023-07-14 Thread Steven Rostedt
On Fri, 14 Jul 2023 14:34:04 +0900
 wrote:

> >> > So I'm confused about why it's mentioned. Was it backported?  
> >> 
> >> Taketo Kabe, could you please help to clean this confusion up? Did you
> >> mean 5.19 in https://bugzilla.kernel.org/show_bug.cgi?id=217669#c5 ? And
> >> BTW: did you really use a vanilla kernel for your bisection?  
> 
> 
> Reporter Me:
> I bisected using freedesktop.org kernel tree, which git commit ID is
> in sync with kernel.org
> but version number in ./Makefile could be slighty behind. 
> 
> Patch in
> https://bugzilla.kernel.org/show_bug.cgi?id=217669#c4
> fixed the problem in freedesktop.org kernel 5.18.0-rc2 .
> This may explain that in kernel.org tree, the said commit is in kernel-5.19.

Even if the bisect did land on this commit, it doesn't make sense. I would
think that one of the results of the bisect was incorrect (a pass that
should have failed?), as that would lead the bisect down to the wrong
conclusion.

Now if you you remove this commit and everything works fine, and add it
back again and it fails reliably, then I can't argue it is not the commit.

But the commit in question kicks off a worker thread at boot up to search
for weak functions that were tagged to be traced by the function tracer and
sets them to "disabled" to never be traced.

Is the function tracer used at all here? I really do not see how this
commit affects the code that is crashing. Unless there's something wrong
with the way the kworker was set up and it corrupted other kworkers :-/

-- Steve


Re: radeon.ko/i586: BUG: kernel NULL pointer dereference, address: 00000004

2023-07-14 Thread Steven Rostedt
On Fri, 14 Jul 2023 09:50:17 +0700
Bagas Sanjaya  wrote:

> Hi,
> 
> I notice a regression report on Bugzilla [1]. Quoting from it:
> 
> 
> See Bugzilla for the full thread and attached patches that fixes
> this regression.
> 
> Later, when bisecting, the reporter got better kernel trace:
> 
> > [  469.825305] BUG: kernel NULL pointer dereference, address: 0004
> > [  469.830502] #PF: supervisor read access in kernel mode
> > [  469.830502] #PF: error_code(0x) - not-present page
> > [  469.830502] *pde = 
> > [  469.830502] Oops:  [#1] PREEMPT SMP
> > [  469.830502] CPU: 0 PID: 365 Comm: systemd-udevd Not tainted 
> > 5.14.0-221.el9.v1.i586 #1

This is a 5.14 kernel right?

> > [  469.830502] Hardware name: System Manufacturer System Name/ALADDIN5, 
> > BIOS 0626 07/15/95
> > [  469.830502] EIP: _raw_spin_lock_irqsave+0x1f/0x40
> > [  469.830502] Code: cc cc cc cc cc cc cc 3e cc cc cc 3e 55 89 c1 89 55 89 
> > c1 89 5b fa 64 ff 5b fa 64 ff c2 31 d2 be c2 31 d2 be 89 d0 3e 0f 89 d0 
> > <3e> 0f 89 d8 5b 5e 89 d8 5b 5e 26 00 90 89 26 00 90 89 b7 15 75 ff
> > [  469.830502] EAX:  EBX: 0246 ECX: 0004 EDX: 
> > [  469.830502] ESI: 0001 EDI: c3e71c40 EBP: c3e71c34 ESP: c3e71c2c
> > [  469.830502] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010046
> > [  469.830502] CR0: 80050033 CR2: 0004 CR3: 057fa000 CR4: 0010
> > [  469.830502] Call Trace:
> > [  469.830502]  kthread_flush_worker+0x52/0xa0
> > [  469.830502]  ? kthread_should_park+0x40/0x40
> > [  469.830502]  drm_crtc_vblank_off+0x1d7/0x230 [drm]
> > [  469.830502]  radeon_crtc_dpms+0x197/0x1b0 [radeon]
> > [  469.830502]  radeon_crtc_disable+0x16/0xa0 [radeon]
> > [  469.830502]  __drm_helper_disable_unused_functions+0x74/0xc0 
> > [drm_kms_helper]
> > [  469.830502]  drm_helper_disable_unused_functions+0x3c/0x50 
> > [drm_kms_helper]
> > [  469.830502]  radeon_fbdev_init+0xb0/0x130 [radeon]
> > [  469.830502]  radeon_modeset_init+0x25d/0x320 [radeon]
> > [  469.830502]  radeon_driver_load_kms+0xc4/0x240 [radeon]
> > [  469.830502]  drm_dev_register+0xb4/0x1a0 [drm]
> > [  469.830502]  radeon_pci_probe+0xc0/0x100 [radeon]
> > [  469.830502]  pci_device_probe+0xbc/0x150
> > [  469.830502]  really_probe+0xb7/0x350
> > [  469.830502]  __driver_probe_device+0x109/0x1e0
> > [  469.830502]  driver_probe_device+0x1f/0x90
> > [  469.830502]  __driver_attach+0x8a/0x1b0
> > [  469.830502]  ? __device_attach_driver+0x100/0x100
> > [  469.830502]  bus_for_each_dev+0x58/0x90
> > [  469.830502]  driver_attach+0x19/0x20
> > [  469.830502]  ? __device_attach_driver+0x100/0x100
> > [  469.830502]  bus_add_driver+0x12f/0x1d0
> > [  469.830502]  driver_register+0x79/0xd0
> > [  469.830502]  ? 0xf7dde000
> > [  469.830502]  __pci_register_driver+0x52/0x60
> > [  469.830502]  radeon_module_init+0x5c/0x1000 [radeon]
> > [  469.830502]  do_one_initcall+0x3e/0x1c0
> > [  469.830502]  ? __vunmap+0x20b/0x2a0
> > [  469.830502]  ? __vunmap+0x20b/0x2a0
> > [  469.830502]  ? kmem_cache_alloc_trace+0x38/0x440
> > [  469.830502]  do_init_module+0x52/0x260
> > [  469.830502]  load_module+0x930/0x9b0
> > [  469.830502]  __ia32_sys_init_module+0x15d/0x180
> > [  469.830502]  do_int80_syscall_32+0x2e/0x80
> > [  469.830502]  entry_INT80_32+0xf0/0xf0
> > [  469.830502] EIP: 0xb79e7e4e
> > [  469.830502] Code: 0f 83 d6 06 00 00 c3 66 90 66 90 90 57 56 53 8b 7c 24 
> > 20 8b 74 24 1c 8b 54 24 18 8b 4c 24 14 8b 5c 24 10 b8 80 00 00 00 cd 80 
> > <5b> 5e 5f 3d 01 f0 ff ff 0f 83 a4 06 00 00 c3 66 90 90 53 8b 54 24
> > [  469.830502] EAX: ffda EBX: b5526010 ECX: 0020d79c EDX: b7c26274
> > [  469.830502] ESI: b7c20295 EDI: b7c2ddd8 EBP: 018af7c0 ESP: bfd2f810
> > [  469.830502] DS: 007b ES: 007b FS:  GS: 0033 SS: 007b EFLAGS: 0292
> > [  469.830502] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 
> > nft_fib radeon(+) nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject 
> > nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
> > gpu_sched drm_buddy i2c_algo_bit drm_display_helper cec drm_ttm_helper 
> > ppdev ttm rfkill ip_set nf_tables libcrc32c nfnetlink drm_kms_helper pcspkr 
> > syscopyarea e100 sysfillrect parport_pc sysimgblt mii fb_sys_fops parport 
> > qrtr drm fuse ext4 mbcache jbd2 sd_mod t10_pi sr_mod crc64_rocksoft_generic 
> > cdrom crc64_rocksoft crc64 sg ata_generic pata_ali libata serio_raw
> > [  469.830502] CR2: 0004
> > [  469.830502] ---[ end trace 30555bd5ee4bee23 ]---
> > [  469.830502] EIP: _raw_spin_lock_irqsave+0x1f/0x40
> > [  469.830502] Code: cc cc cc cc cc cc cc 3e cc cc cc 3e 55 89 c1 89 55 89 
> > c1 89 5b fa 64 ff 5b fa 64 ff c2 31 d2 be c2 31 d2 be 89 d0 3e 0f 89 d0 
> > <3e> 0f 89 d8 5b 5e 89 d8 5b 5e 26 00 90 89 26 00 90 89 b7 15 75 ff
> > [  469.830502] EAX:  EBX: 0246 ECX: 0004 EDX: 
> > [  469.830502] ESI: 0001 EDI: c3e71c40 EBP: c3e71c34 ESP: c3e71c2c
> > [  469.830502] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS:

Re: [Intel-gfx] [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()

2023-03-16 Thread Steven Rostedt
On Wed, 15 Mar 2023 20:21:33 -0400
Steven Rostedt  wrote:

> On Wed, 15 Mar 2023 16:20:11 -0400
> Steven Rostedt  wrote:
> 
> > On Wed, 15 Mar 2023 20:51:49 +0100
> > Christian König  wrote:
> >   
> > > Steven please try the attached patch.
> > 
> > I applied it, but as it's not always reproducible, I'll have to give it
> > several runs before I give you my "tested-by" tag.  
> 
> I ran my tests a bunch of times with this applied and it didn't fail once.
> As it got further than it ever did before (it usually took 1 to 3 runs to
> trigger, and I ran it over 10 times).
> 
> Reported-by: Steven Rostedt (Google) 
> Tested-by: Steven Rostedt (Google) 

I hope that this gets in by -rc3, as I want to start basing my next branch
on that tag.

Thanks,

-- Steve



Re: [Intel-gfx] [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()

2023-03-16 Thread Steven Rostedt
On Wed, 15 Mar 2023 16:20:11 -0400
Steven Rostedt  wrote:

> On Wed, 15 Mar 2023 20:51:49 +0100
> Christian König  wrote:
> 
> > Steven please try the attached patch.  
> 
> I applied it, but as it's not always reproducible, I'll have to give it
> several runs before I give you my "tested-by" tag.

I ran my tests a bunch of times with this applied and it didn't fail once.
As it got further than it ever did before (it usually took 1 to 3 runs to
trigger, and I ran it over 10 times).

Reported-by: Steven Rostedt (Google) 
Tested-by: Steven Rostedt (Google) 

Thanks!

-- Steve


Re: [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()

2023-03-15 Thread Steven Rostedt
On Wed, 15 Mar 2023 11:57:12 -0400
Steven Rostedt  wrote:

So I'm looking at the backtraces.

> The WARN_ON triggered:
> 
> [   21.481449] mpls_gso: MPLS GSO support
> [   21.488795] IPI shorthand broadcast: enabled
> [   21.488873] [ cut here ]
> [   21.490101] [ cut here ]
> 
> [   21.491693] WARNING: CPU: 1 PID: 38 at drivers/gpu/drm/ttm/ttm_bo.c:332 
> ttm_bo_release+0x2ac/0x2fc  <<< Line of the added WARN_ON()

This happened on CPU 1.

> 
> [   21.492940] refcount_t: underflow; use-after-free.
> [   21.492965] WARNING: CPU: 0 PID: 84 at lib/refcount.c:28 
> refcount_warn_saturate+0xb6/0xfc

This happened on CPU 0.

> [   21.496116] Modules linked in:
> [   21.497197] Modules linked in:
> [   21.500105] CPU: 1 PID: 38 Comm: kworker/1:1 Not tainted 
> 6.3.0-rc2-test-00047-g6015b1aca1a2-dirty #993
> [   21.500789] CPU: 0 PID: 84 Comm: kworker/0:1H Not tainted 
> 6.3.0-rc2-test-00047-g6015b1aca1a2-dirty #993
> [   21.501882] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> 1.16.0-debian-1.16.0-5 04/01/2014
> [   21.503533] sched_clock: Marking stable (20788024762, 
> 714243692)->(22140778105, -638509651)
> [   21.504080] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> 1.16.0-debian-1.16.0-5 04/01/2014
> [   21.504089] Workqueue: ttm ttm_bo_delayed_delete
> [   21.507196] Workqueue: events drm_fb_helper_damage_work
> [   21.509235] 
> [   21.510291] registered taskstats version 1
> [   21.510302] Running ring buffer tests...
> [   21.511792] 
> [   21.513870] EIP: refcount_warn_saturate+0xb6/0xfc
> [   21.515261] EIP: ttm_bo_release+0x2ac/0x2fc
> [   21.516566] Code: 68 00 27 0c d8 e8 36 3b aa ff 0f 0b 58 c9 c3 90 80 3d 41 
> c2 37 d8 00 75 8a c6 05 41 c2 37 d8 01 68 2c 27 0c d8 e8 16 3b aa ff <0f> 0b 
> 59 c9 c3 80 3d 3f c2 37 d8 00 0f 85 67 ff ff ff c6 05 3f c2
> [   21.516998] Code: ff 8d b4 26 00 00 00 00 66 90 0f 0b 8b 43 10 85 c0 0f 84 
> a1 fd ff ff 8d 76 00 0f 0b 8b 43 28 85 c0 0f 84 9c fd ff ff 8d 76 00 <0f> 0b 
> e9 92 fd ff ff 8d b4 26 00 00 00 00 66 90 c7 43 18 00 00 00
> [   21.517905] EAX: 0026 EBX: c129d150 ECX: 0040 EDX: 0002
> [   21.518987] EAX: d78c8550 EBX: c129d134 ECX: c129d134 EDX: 0001
> [   21.519337] ESI: c129d0bc EDI: f6f91200 EBP: c2b8bf18 ESP: c2b8bf14
> [   21.520617] ESI: c129d000 EDI: c126a7a0 EBP: c1839c24 ESP: c1839bec
> [   21.521546] DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068 EFLAGS: 00010286
> [   21.526154] DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068 EFLAGS: 00010286
> [   21.526162] CR0: 80050033 CR2:  CR3: 18506000 CR4: 00150ef0
> [   21.526166] Call Trace:
> [   21.526189]  ? ww_mutex_unlock+0x3a/0x94
> [   21.530300] CR0: 80050033 CR2: ff9ff000 CR3: 18506000 CR4: 00150ef0
> [   21.531722]  ? ttm_bo_cleanup_refs+0xc4/0x1e0
> [   21.533114] Call Trace:
> [   21.534516]  ttm_mem_evict_first+0x3d3/0x568
> [   21.535901]  ttm_bo_delayed_delete+0x9c/0xa4
> [   21.537391]  ? kfree+0x6b/0xdc
> [   21.538901]  process_one_work+0x21a/0x484
> [   21.540279]  ? ttm_range_man_alloc+0xe0/0xec
> [   21.540854]  worker_thread+0x14a/0x39c
> [   21.541714]  ? ttm_range_man_fini_nocheck+0xe8/0xe8
> [   21.543332]  kthread+0xea/0x10c
> [   21.544301]  ttm_bo_mem_space+0x1d0/0x1e4
> [   21.544942]  ? process_one_work+0x484/0x484
> [   21.545887]  ttm_bo_validate+0xc5/0x19c
> [   21.546986]  ? kthread_complete_and_exit+0x1c/0x1c
> [   21.547680]  ttm_bo_init_reserved+0x15e/0x1fc
> [   21.548716]  ret_from_fork+0x1c/0x28
> [   21.549650]  qxl_bo_create+0x145/0x20c

The qxl_bo_create() calls ttm_bo_init_reserved() as the object in question
is about to be freed.

I'm guessing what is happening here, is that an object was to be freed by
the delayed_delete, and in the mean time, something else picked it up.

What's protecting this from not being used again?

-- Steve



Re: [Intel-gfx] [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()

2023-03-15 Thread Steven Rostedt
On Wed, 15 Mar 2023 20:51:49 +0100
Christian König  wrote:

> Steven please try the attached patch.

I applied it, but as it's not always reproducible, I'll have to give it
several runs before I give you my "tested-by" tag.

-- Steve


Re: [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()

2023-03-15 Thread Steven Rostedt
On Wed, 15 Mar 2023 16:25:11 +0100
Christian König  wrote:
> >>
> >> Thanks for the notice,  
> > I'm still getting this on Linus's latest tree.  
> 
> This must be some reference counting issue which only happens in your 
> particular use case. We have tested this quite extensively and couldn't 
> reproduce it so far.

Have you tried 32 bit with my config. I also sent a link to your previous
email that gives access to the VM image I'm using that is triggering this
issue.

Here it is again:

  The libvirt xml file is here: https://rostedt.org/vm-images/tracetest-32.xml
  and the VM image itself is here: 
https://rostedt.org/vm-images/tracetest-32.qcow2.bz2

> 
> Can you apply this code snippet here and see if you get any warning in 
> the system logs?
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 459f1b4440da..efc390bfd69c 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -314,6 +314,7 @@ static void ttm_bo_delayed_delete(struct work_struct 
> *work)
>      dma_resv_lock(bo->base.resv, NULL);
>      ttm_bo_cleanup_memtype_use(bo);
>      dma_resv_unlock(bo->base.resv);
> +   bo->delayed_delete.func = NULL;
>      ttm_bo_put(bo);
>   }
> 
> @@ -327,6 +328,8 @@ static void ttm_bo_release(struct kref *kref)
>      WARN_ON_ONCE(bo->pin_count);
>      WARN_ON_ONCE(bo->bulk_move);
> 
> +   WARN_ON(bo->delayed_delete.func != NULL);
> +
>      if (!bo->deleted) {
>      ret = ttm_bo_individualize_resv(bo);
>      if (ret) {
> 

The WARN_ON triggered:

[   21.481449] mpls_gso: MPLS GSO support
[   21.488795] IPI shorthand broadcast: enabled
[   21.488873] [ cut here ]
[   21.490101] [ cut here ]

[   21.491693] WARNING: CPU: 1 PID: 38 at drivers/gpu/drm/ttm/ttm_bo.c:332 
ttm_bo_release+0x2ac/0x2fc  <<< Line of the added WARN_ON()

[   21.492940] refcount_t: underflow; use-after-free.
[   21.492965] WARNING: CPU: 0 PID: 84 at lib/refcount.c:28 
refcount_warn_saturate+0xb6/0xfc
[   21.496116] Modules linked in:
[   21.497197] Modules linked in:
[   21.500105] CPU: 1 PID: 38 Comm: kworker/1:1 Not tainted 
6.3.0-rc2-test-00047-g6015b1aca1a2-dirty #993
[   21.500789] CPU: 0 PID: 84 Comm: kworker/0:1H Not tainted 
6.3.0-rc2-test-00047-g6015b1aca1a2-dirty #993
[   21.501882] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.16.0-debian-1.16.0-5 04/01/2014
[   21.503533] sched_clock: Marking stable (20788024762, 
714243692)->(22140778105, -638509651)
[   21.504080] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.16.0-debian-1.16.0-5 04/01/2014
[   21.504089] Workqueue: ttm ttm_bo_delayed_delete
[   21.507196] Workqueue: events drm_fb_helper_damage_work
[   21.509235] 
[   21.510291] registered taskstats version 1
[   21.510302] Running ring buffer tests...
[   21.511792] 
[   21.513870] EIP: refcount_warn_saturate+0xb6/0xfc
[   21.515261] EIP: ttm_bo_release+0x2ac/0x2fc
[   21.516566] Code: 68 00 27 0c d8 e8 36 3b aa ff 0f 0b 58 c9 c3 90 80 3d 41 
c2 37 d8 00 75 8a c6 05 41 c2 37 d8 01 68 2c 27 0c d8 e8 16 3b aa ff <0f> 0b 59 
c9 c3 80 3d 3f c2 37 d8 00 0f 85 67 ff ff ff c6 05 3f c2
[   21.516998] Code: ff 8d b4 26 00 00 00 00 66 90 0f 0b 8b 43 10 85 c0 0f 84 
a1 fd ff ff 8d 76 00 0f 0b 8b 43 28 85 c0 0f 84 9c fd ff ff 8d 76 00 <0f> 0b e9 
92 fd ff ff 8d b4 26 00 00 00 00 66 90 c7 43 18 00 00 00
[   21.517905] EAX: 0026 EBX: c129d150 ECX: 0040 EDX: 0002
[   21.518987] EAX: d78c8550 EBX: c129d134 ECX: c129d134 EDX: 0001
[   21.519337] ESI: c129d0bc EDI: f6f91200 EBP: c2b8bf18 ESP: c2b8bf14
[   21.520617] ESI: c129d000 EDI: c126a7a0 EBP: c1839c24 ESP: c1839bec
[   21.521546] DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068 EFLAGS: 00010286
[   21.526154] DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068 EFLAGS: 00010286
[   21.526162] CR0: 80050033 CR2:  CR3: 18506000 CR4: 00150ef0
[   21.526166] Call Trace:
[   21.526189]  ? ww_mutex_unlock+0x3a/0x94
[   21.530300] CR0: 80050033 CR2: ff9ff000 CR3: 18506000 CR4: 00150ef0
[   21.531722]  ? ttm_bo_cleanup_refs+0xc4/0x1e0
[   21.533114] Call Trace:
[   21.534516]  ttm_mem_evict_first+0x3d3/0x568
[   21.535901]  ttm_bo_delayed_delete+0x9c/0xa4
[   21.537391]  ? kfree+0x6b/0xdc
[   21.538901]  process_one_work+0x21a/0x484
[   21.540279]  ? ttm_range_man_alloc+0xe0/0xec
[   21.540854]  worker_thread+0x14a/0x39c
[   21.541714]  ? ttm_range_man_fini_nocheck+0xe8/0xe8
[   21.543332]  kthread+0xea/0x10c
[   21.544301]  ttm_bo_mem_space+0x1d0/0x1e4
[   21.544942]  ? process_one_work+0x484/0x484
[   21.545887]  ttm_bo_validate+0xc5/0x19c
[   21.546986]  ? kthread_complete_and_exit+0x1c/0x1c
[   21.547680]  ttm_bo_init_reserved+0x15e/0x1fc
[   21.548716]  ret_from_fork+0x1c/0x28
[   21.549650]  qxl_bo_create+0x145/0x20c

Note, this is all on boot up before user space is running.

-- Steve


Re: [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()

2023-03-15 Thread Steven Rostedt
On Wed, 15 Mar 2023 11:57:12 -0400
Steven Rostedt  wrote:

> The WARN_ON triggered:
> 
> [   21.481449] mpls_gso: MPLS GSO support
> [   21.488795] IPI shorthand broadcast: enabled
> [   21.488873] [ cut here ]
> [   21.490101] [ cut here ]
> 
> [   21.491693] WARNING: CPU: 1 PID: 38 at drivers/gpu/drm/ttm/ttm_bo.c:332 
> ttm_bo_release+0x2ac/0x2fc  <<< Line of the added WARN_ON()
> 
> [   21.492940] refcount_t: underflow; use-after-free.
> [   21.492965] WARNING: CPU: 0 PID: 84 at lib/refcount.c:28 
> refcount_warn_saturate+0xb6/0xfc
> [   21.496116] Modules linked in:
> [   21.497197] Modules linked in:
> [   21.500105] CPU: 1 PID: 38 Comm: kworker/1:1 Not tainted 
> 6.3.0-rc2-test-00047-g6015b1aca1a2-dirty #993
> [   21.500789] CPU: 0 PID: 84 Comm: kworker/0:1H Not tainted 
> 6.3.0-rc2-test-00047-g6015b1aca1a2-dirty #993
> [   21.501882] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> 1.16.0-debian-1.16.0-5 04/01/2014
> [   21.503533] sched_clock: Marking stable (20788024762, 
> 714243692)->(22140778105, -638509651)
> [   21.504080] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> 1.16.0-debian-1.16.0-5 04/01/2014
> [   21.504089] Workqueue: ttm ttm_bo_delayed_delete
> [   21.507196] Workqueue: events drm_fb_helper_damage_work
> [   21.509235] 
> [   21.510291] registered taskstats version 1
> [   21.510302] Running ring buffer tests...
> [   21.511792] 
> [   21.513870] EIP: refcount_warn_saturate+0xb6/0xfc
> [   21.515261] EIP: ttm_bo_release+0x2ac/0x2fc
> [   21.516566] Code: 68 00 27 0c d8 e8 36 3b aa ff 0f 0b 58 c9 c3 90 80 3d 41 
> c2 37 d8 00 75 8a c6 05 41 c2 37 d8 01 68 2c 27 0c d8 e8 16 3b aa ff <0f> 0b 
> 59 c9 c3 80 3d 3f c2 37 d8 00 0f 85 67 ff ff ff c6 05 3f c2
> [   21.516998] Code: ff 8d b4 26 00 00 00 00 66 90 0f 0b 8b 43 10 85 c0 0f 84 
> a1 fd ff ff 8d 76 00 0f 0b 8b 43 28 85 c0 0f 84 9c fd ff ff 8d 76 00 <0f> 0b 
> e9 92 fd ff ff 8d b4 26 00 00 00 00 66 90 c7 43 18 00 00 00
> [   21.517905] EAX: 0026 EBX: c129d150 ECX: 0040 EDX: 0002
> [   21.518987] EAX: d78c8550 EBX: c129d134 ECX: c129d134 EDX: 0001
> [   21.519337] ESI: c129d0bc EDI: f6f91200 EBP: c2b8bf18 ESP: c2b8bf14
> [   21.520617] ESI: c129d000 EDI: c126a7a0 EBP: c1839c24 ESP: c1839bec
> [   21.521546] DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068 EFLAGS: 00010286
> [   21.526154] DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068 EFLAGS: 00010286
> [   21.526162] CR0: 80050033 CR2:  CR3: 18506000 CR4: 00150ef0
> [   21.526166] Call Trace:
> [   21.526189]  ? ww_mutex_unlock+0x3a/0x94
> [   21.530300] CR0: 80050033 CR2: ff9ff000 CR3: 18506000 CR4: 00150ef0
> [   21.531722]  ? ttm_bo_cleanup_refs+0xc4/0x1e0
> [   21.533114] Call Trace:
> [   21.534516]  ttm_mem_evict_first+0x3d3/0x568
> [   21.535901]  ttm_bo_delayed_delete+0x9c/0xa4
> [   21.537391]  ? kfree+0x6b/0xdc
> [   21.538901]  process_one_work+0x21a/0x484
> [   21.540279]  ? ttm_range_man_alloc+0xe0/0xec
> [   21.540854]  worker_thread+0x14a/0x39c
> [   21.541714]  ? ttm_range_man_fini_nocheck+0xe8/0xe8
> [   21.543332]  kthread+0xea/0x10c

So I triggered it again, and the same backtrace is there.

> [   21.544301]  ttm_bo_mem_space+0x1d0/0x1e4

It looks like the object is being reserved before it's fully removed. And
it's somewhere in this tty_bo_mem_space() (which comes from the
qxl_bo_create()).

I don't know this code at all, nor do I have any idea of what it's trying
to do. All I know is that this is triggering often (not always), and it has
to do with some race.

Now my config has lots of debugging enabled, which slows down the system
quite a bit. This also happens to open up race windows. Just because your
testing doesn't trigger it, doesn't mean that the race doesn't exist. It's
just likely to be very hard to hit.

> [   21.544942]  ? process_one_work+0x484/0x484
> [   21.545887]  ttm_bo_validate+0xc5/0x19c
> [   21.546986]  ? kthread_complete_and_exit+0x1c/0x1c
> [   21.547680]  ttm_bo_init_reserved+0x15e/0x1fc
> [   21.548716]  ret_from_fork+0x1c/0x28
> [   21.549650]  qxl_bo_create+0x145/0x20c

Here's the latest backtrace:

[  170.817449] [ cut here ]
[  170.817455] [ cut here ]
[  170.818210] refcount_t: underflow; use-after-free.
[  170.818228] WARNING: CPU: 0 PID: 267 at lib/refcount.c:28 
refcount_warn_saturate+0xb6/0xfc
[  170.819352] WARNING: CPU: 3 PID: 2382 at drivers/gpu/drm/ttm/ttm_bo.c:332 
ttm_bo_release+0x278/0x2c8
[  170.820124] Modules linked in:
[  170.822127] Modules linked in:
[  170.823829] 
[  170.823832] CPU: 0 PID: 267 Comm: kworker/0:10H Not tainted 
6.3.0-rc2-test-00047-g6015b1aca1a2-dirty #998
[  170.824610] 
[  170.825121]

[BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()

2023-03-08 Thread Steven Rostedt


In a report for a regression in my code, I tried to run v6.3-rc1 through my
tests. It crashed at boot up on my first test (my start up tests do take a
long time, hence the 206 seconds of boot!).

[  206.238782] [ cut here ]
[  206.277786] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[  206.277946] WARNING: CPU: 0 PID: 332 at kernel/locking/mutex.c:582 
__ww_mutex_lock.constprop.0+0x566/0xfec
[  206.313338] Modules linked in:
[  206.324732] CPU: 0 PID: 332 Comm: kworker/0:13H Not tainted 
6.3.0-rc1-test-1-ga98bd42762ed-dirty #965
[  206.338273] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.16.0-debian-1.16.0-5 04/01/2014
[  206.353596] Workqueue: ttm ttm_bo_delayed_delete
[  206.370520] EIP: __ww_mutex_lock.constprop.0+0x566/0xfec
[  206.382855] Code: e8 ab 59 95 ff 85 c0 0f 84 25 fb ff ff 8b 0d 58 c0 3b cf 
85 c9 0f 85 17 fb ff ff 68 e0 8d 07 cf 68 2b ac 05 cf e8 e6 e6 3f ff <0f> 0b 58 
5a e9 ff fa ff ff e8 78 59 95 ff 85 c0 74 0e 8b 0d 58 c0
[  206.411247] EAX: 0028 EBX:  ECX: c3ae5dd8 EDX: 0002
[  206.425193] ESI:  EDI: c2d5f0bc EBP: c3ae5f00 ESP: c3ae5eac
[  206.439236] DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068 EFLAGS: 00010246
[  206.453597] CR0: 80050033 CR2: ff9ff000 CR3: 0f512000 CR4: 00150ef0
[  206.467841] Call Trace:
[  206.481059]  ? ttm_bo_delayed_delete+0x30/0x94
[  206.494980]  ww_mutex_lock+0x32/0x94
[  206.508699]  ttm_bo_delayed_delete+0x30/0x94
[  206.522371]  process_one_work+0x21a/0x538
[  206.536306]  worker_thread+0x146/0x398
[  206.549860]  kthread+0xea/0x10c
[  206.563141]  ? process_one_work+0x538/0x538
[  206.576835]  ? kthread_complete_and_exit+0x1c/0x1c
[  206.590652]  ret_from_fork+0x1c/0x28
[  206.604522] irq event stamp: 4219
[  206.617852] hardirqs last  enabled at (4219): [] 
_raw_spin_unlock_irqrestore+0x2d/0x58
[  206.633077] hardirqs last disabled at (4218): [] 
kvfree_call_rcu+0x155/0x2ec
[  206.648161] softirqs last  enabled at (3570): [] 
__do_softirq+0x2f3/0x48b
[  206.663025] softirqs last disabled at (3565): [] 
call_on_stack+0x45/0x4c
[  206.678065] ---[ end trace  ]---

Looks like there was a lock possibly used after free. But as commit
9bff18d13473a9fdf81d5158248472a9d8ecf2bd ("drm/ttm: use per BO cleanup
workers") changed a lot of this code, I figured it may be the culprit.

-- Steve


Re: [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()

2023-03-08 Thread Steven Rostedt
On Tue, 7 Mar 2023 21:22:23 -0500
Steven Rostedt  wrote:

> Looks like there was a lock possibly used after free. But as commit
> 9bff18d13473a9fdf81d5158248472a9d8ecf2bd ("drm/ttm: use per BO cleanup
> workers") changed a lot of this code, I figured it may be the culprit.

If I bothered to look at the second warning after this one (I usually stop
after the first), it appears to state there was a use after free issue.

[  206.692285] [ cut here ]
[  206.706333] refcount_t: underflow; use-after-free.
[  206.720577] WARNING: CPU: 0 PID: 332 at lib/refcount.c:28 
refcount_warn_saturate+0xb6/0xfc
[  206.735810] Modules linked in:
[  206.749493] CPU: 0 PID: 332 Comm: kworker/0:13H Tainted: GW  
6.3.0-rc1-test-1-ga98bd42762ed-dirty #965
[  206.765833] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.16.0-debian-1.16.0-5 04/01/2014
[  206.781767] Workqueue: ttm ttm_bo_delayed_delete
[  206.796500] EIP: refcount_warn_saturate+0xb6/0xfc
[  206.811121] Code: 68 50 1c 0d cf e8 66 b3 a9 ff 0f 0b 58 c9 c3 90 80 3d 57 
c6 38 cf 00 75 8a c6 05 57 c6 38 cf 01 68 7c 1c 0d cf e8 46 b3 a9 ff <0f> 0b 59 
c9 c3 80 3d 55 c6 38 cf 00 0f 85 67 ff ff ff c6 05 55 c6
[  206.844560] EAX: 0026 EBX: c2d5f150 ECX: c3ae5e40 EDX: 0002
[  206.862109] ESI: c2d5f0bc EDI: f6f91200 EBP: c3ae5f18 ESP: c3ae5f14
[  206.878773] DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068 EFLAGS: 00010246
[  206.895665] CR0: 80050033 CR2: ff9ff000 CR3: 0f512000 CR4: 00150ef0
[  206.912303] Call Trace:
[  206.927940]  ttm_bo_delayed_delete+0x8c/0x94
[  206.944179]  process_one_work+0x21a/0x538
[  206.960605]  worker_thread+0x146/0x398
[  206.976839]  kthread+0xea/0x10c
[  206.992696]  ? process_one_work+0x538/0x538
[  207.008827]  ? kthread_complete_and_exit+0x1c/0x1c
[  207.025150]  ret_from_fork+0x1c/0x28
[  207.041307] irq event stamp: 4219
[  207.056883] hardirqs last  enabled at (4219): [] 
_raw_spin_unlock_irqrestore+0x2d/0x58
[  207.074298] hardirqs last disabled at (4218): [] 
kvfree_call_rcu+0x155/0x2ec
[  207.091461] softirqs last  enabled at (3570): [] 
__do_softirq+0x2f3/0x48b
[  207.107979] softirqs last disabled at (3565): [] 
call_on_stack+0x45/0x4c
[  207.123827] ---[ end trace  ]---


-- Steve


Re: [PATCH v2 26/27] dyndbg: 4 new trace-events: pr_debug, dev_dbg, drm_{,dev}debug

2022-06-29 Thread Steven Rostedt


Sorry for the late review. I finally got some time to look at this.

On Mon, 16 May 2022 16:56:39 -0600
Jim Cromie  wrote:


> diff --git a/include/trace/events/drm.h b/include/trace/events/drm.h
> new file mode 100644
> index ..6de80dd68620
> --- /dev/null
> +++ b/include/trace/events/drm.h
> @@ -0,0 +1,68 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM drm
> +
> +#if !defined(_TRACE_DRM_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_DRM_H
> +
> +#include 
> +
> +/* drm_debug() was called, pass its args */
> +TRACE_EVENT(drm_debug,
> + TP_PROTO(int drm_debug_category, struct va_format *vaf),
> +
> + TP_ARGS(drm_debug_category, vaf),
> +
> + TP_STRUCT__entry(
> + __field(int, drm_debug_category)
> + __dynamic_array(char, msg, 256)
> + ),
> +
> + TP_fast_assign(
> + int len;
> +
> + __entry->drm_debug_category = drm_debug_category;
> + vsnprintf(__get_str(msg), 256, vaf->fmt, *vaf->va);
> +
> + len = strlen(__get_str(msg));
> + if (len > 0 && (__get_str(msg)[len - 1] == '\n'))
> + len -= 1;
> + __get_str(msg)[len] = 0;
> + ),
> +
> + TP_printk("%s", __get_str(msg))
> +);
> +
> +/* drm_devdbg() was called, pass its args, preserving order */
> +TRACE_EVENT(drm_devdbg,
> + TP_PROTO(const struct device *dev, int drm_debug_category, struct 
> va_format *vaf),
> +
> + TP_ARGS(dev, drm_debug_category, vaf),
> +
> + TP_STRUCT__entry(
> + __field(const struct device*, dev)
> + __field(int, drm_debug_category)
> + __dynamic_array(char, msg, 256)

You do not want to hardcode the 256 here. That will cause 256 bytes to be
reserved on the buffer, and you will not get that back. Might as well make
it a static array, as you also add 4 bytes to for the offset and size.

I think you want (haven't tested it)

__dynamic_array(char, msg, get_msg_size(vaf))

Where you have:

static unsigned int get_msg_size(struct va_format *vaf)
{
va_list aq;
unsigned int ret;

va_copy(aq, vaf->va);
ret = vsnprintf(NULL, 0, vaf->fmt, aq);
va_end(aq);

return min(ret + 1, 256);
}

What is in the last parameter of __dynamic_array() is used to calculate the
size needed to store the dynamic array.

Hmm, looking at other users of __dynamic_array(), this appears to be a
constant problem. I need to document this better.

-- Steve


> + ),
> +
> + TP_fast_assign(
> + int len;
> +
> + __entry->drm_debug_category = drm_debug_category;
> + __entry->dev = dev;
> + vsnprintf(__get_str(msg), 256, vaf->fmt, *vaf->va);
> +
> + len = strlen(__get_str(msg));
> + if (len > 0 && (__get_str(msg)[len - 1] == '\n'))
> + len -= 1;
> + __get_str(msg)[len] = 0;
> + ),
> +
> + TP_printk("cat:%d, %s %s", __entry->drm_debug_category,
> +   dev_name(__entry->dev), __get_str(msg))
> +);
> +
> +#endif /* _TRACE_DRM_H */
> +


Re: [PATCH 1/3] lib/string_helpers: Consolidate yesno() implementation

2022-01-19 Thread Steven Rostedt
On Wed, 19 Jan 2022 21:25:08 +0200
Andy Shevchenko  wrote:

> > I say keep it one line!
> > 
> > Reviewed-by: Steven Rostedt (Google)   
> 
> I believe Sakari strongly follows the 80 rule, which means...

Checkpatch says "100" I think we need to simply update the docs (the
documentation always lags the code ;-)

bdc48fa11e46f

-- Steve


Re: [PATCH 1/3] lib/string_helpers: Consolidate yesno() implementation

2022-01-19 Thread Steven Rostedt
On Wed, 19 Jan 2022 21:22:57 +0200
Andy Shevchenko  wrote:

> On Wed, Jan 19, 2022 at 04:38:26PM +, David Laight wrote:
> > > > > > +static inline const char *yesno(bool v) { return v ? "yes" : "no"; 
> > > > > > }  
> > > 
> > >   return "yes\0no" + v * 4;
> > > 
> > > :-)  
> > 
> > except '"no\0\0yes" + v * 4' works a bit better.  
> 
> Is it a C code obfuscation contest?
> 

return '/'/'/';

-- Steve


Re: [PATCH 1/3] lib/string_helpers: Consolidate yesno() implementation

2022-01-19 Thread Steven Rostedt
On Wed, 19 Jan 2022 11:15:08 +0200
Andy Shevchenko  wrote:

> > +static inline const char *yesno(bool v) { return v ? "yes" : "no"; }  
> 
> 
> 
> Perhaps keep it on 4 lines? Yes, yes/no is short, but if we add others
> (enable/disable) it will not be possible to keep on one line. And hence
> style will be broken among similar functions.

Agreed. Functions should always be of the normal format:

type func(params)
{
body;
}

Unless it is a stub function.

type func(params) { return 0; }

-- Steve


Re: [PATCH 1/3] lib/string_helpers: Consolidate yesno() implementation

2022-01-19 Thread Steven Rostedt
On Wed, 19 Jan 2022 11:18:59 +0200
Sakari Ailus  wrote:

> On Tue, Jan 18, 2022 at 11:24:48PM -0800, Lucas De Marchi wrote:
> > @@ -1354,8 +1345,7 @@ static bool tomoyo_print_condition(struct 
> > tomoyo_io_buffer *head,
> > case 3:
> > if (cond->grant_log != TOMOYO_GRANTLOG_AUTO)
> > tomoyo_io_printf(head, " grant_log=%s",
> > -tomoyo_yesno(cond->grant_log ==
> > - TOMOYO_GRANTLOG_YES));
> > +yesno(cond->grant_log == 
> > TOMOYO_GRANTLOG_YES));  
> 
> This would be better split on two lines.

Really? Yuck!

I thought the "max line size" guideline was going to grow to a 100, but I
still see it as 80. But anyway...

cond->grant_log ==
TOMOYO_GRANTLOG_YES

is not readable at all. Not compared to

    cond->grant_log == TOMOYO_GRANTLOG_YES

I say keep it one line!

Reviewed-by: Steven Rostedt (Google) 

-- Steve

> 
> Then,
> 
> Reviewed-by: Sakari Ailus 



Re: [PATCH v10 08/10] dyndbg: add print-to-tracefs, selftest with it - RFC

2021-11-22 Thread Steven Rostedt
On Fri, 19 Nov 2021 15:46:31 -0700
jim.cro...@gmail.com wrote:


> > So I could see us supporting subsystem specific trace buffer output
> > via dynamic debug here. We could add new dev_debug() variants that
> > allow say a trace buffer to be supplied. So in that way subsystems
> > could 'opt-out' of having their data put into the global trace buffer.
> > And perhaps some subsystems we would want to allow output to both
> > buffers? The subsystem specific one and the global one?
> >  
> 
>  * trace_array_printk - Print a message to a specific instance
>  * @tr: The instance trace_array descriptor
>  * @ip: The instruction pointer that this is called from.
>  * @fmt: The format to print (printf format)
>  *
> 
> what happens when @tr == NULL ?

It does nothing, but perhaps crash the kernel.

> It could allow up-flow of events to the global instance

Absolutely not!

Then it's just a reimplementation of trace_printk(). Which I refuse to
have.

Nothing should just dump to the main instance. Once we allow that, then
everyone will be dumping there and you will no longer be able to trace
anything because it will be filled with noise.

What is allowed is an event that acts like a trace_printk() but is an
event, which you can turn off (have default off), and even pick which
instance to go to.

> 
> > Thanks,
> >
> > -Jason
> >
> >  
> 
> So I wonder, is there any conceptual utility to this ?
> 
> echo 1 > instances/foo/filter_up  # enable event upflow (or query-time 
> merging?)
> 
> Maybe enabling this causes other files (the ones missing from
> instances/foo) to magically appear
> so all those filtering capacities also appear.


I've been busy doing other things so I haven't been keeping up with
this thread (which I need to go back and read). Perhaps it was already
stated, but I don't know why you want that.

trace-cmd can read several instances (including the top level one) and
interleave them nicely, if that is what you are looking for. So can
KernelShark.

-- Steve


Re: [PATCH v10 08/10] dyndbg: add print-to-tracefs, selftest with it - RFC

2021-11-12 Thread Steven Rostedt
On Fri, 12 Nov 2021 10:08:41 -0500
Jason Baron  wrote:

> > A key difference between that patchset and this patch (besides that
> > small fact that I used +x instead of +T) was that my patchset allowed
> > the dyndbg trace to be emitted to the main buffer and did not force them
> > to be in an instance-specific buffer.  
> 
> Yes, I agree I'd prefer that we print here to the 'main' buffer - it seems to 
> keep things simpler and easier to combine the output from different
> sources as you mentioned.

I do not want anything to print to the "main buffer" that can not be
filtered or turned off by the tracing infrastructure itself (aka tracefs
file system).

Once we allow that, then the trace file will become useless because
everything will write to the main buffer and fill it with noise.

Events that can be enabled and disabled from tracefs are fine, as they can
be limited. This is why I added that nasty warning if people leave around
trace_printk(), because it does exactly this (write to the main buffer).
It's fine for debugging a custom kernel, but should never be enabled in
something that is shipped, or part of mainline.

-- Steve


Re: [PATCH v10 08/10] dyndbg: add print-to-tracefs, selftest with it - RFC

2021-11-12 Thread Steven Rostedt
On Fri, 12 Nov 2021 12:32:23 -0500
Jason Baron  wrote:

> Ok, it looks like Vincent's patch defines a dyndbg event and then uses
> 'trace_dyndbg()' to output to the 'main' log. So all dynamic output to
> the 'main' ftrace buffer goes through that event if I understand it
> correctly. Here's a pointer to it for reference:
> 
> https://lore.kernel.org/lkml/20200825153338.17061-3-vincent.whitchu...@axis.com/
> 
> Would you be ok with that approach?

Yes that approach is fine, because it doesn't actually go to the main log
unless you enable the dyndbg trace event in the main buffer. You could
also enable that event in an instance and have it go there.

-- Steve


Re: [PATCH V2] treewide: Add missing semicolons to __assign_str uses

2021-06-30 Thread Steven Rostedt
On Wed, 30 Jun 2021 04:28:39 -0700
Joe Perches  wrote:

> On Sat, 2021-06-12 at 08:42 -0700, Joe Perches wrote:
> > The __assign_str macro has an unusual ending semicolon but the vast
> > majority of uses of the macro already have semicolon termination.  
> 
> ping?
> 

I wasn't sure I was the one to take this. I can, as I can run tests on
it as well. I have some last minute fixes sent to me on something else,
and I can apply this along with them.

-- Steve
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH V2] treewide: Add missing semicolons to __assign_str uses

2021-06-14 Thread Steven Rostedt
On Sat, 12 Jun 2021 08:42:27 -0700
Joe Perches  wrote:

> The __assign_str macro has an unusual ending semicolon but the vast
> majority of uses of the macro already have semicolon termination.
> 
> $ git grep -P '\b__assign_str\b' | wc -l
> 551
> $ git grep -P '\b__assign_str\b.*;' | wc -l
> 480
> 
> Add semicolons to the __assign_str() uses without semicolon termination
> and all the other uses without semicolon termination via additional defines
> that are equivalent to __assign_str() with the eventual goal of removing
> the semicolon from the __assign_str() macro definition.
> 
> Link: 
> https://lore.kernel.org/lkml/1e068d21106bb6db05b735b4916bb420e6c9842a.ca...@perches.com/

FYI, please send new patches as new threads. Otherwise it is likely to
be missed.

-- Steve
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: adjust the pid in the grab_id trace point

2020-08-12 Thread Steven Rostedt
On Wed, 12 Aug 2020 16:36:36 +0200
Christian König  wrote:

> Am 12.08.20 um 16:17 schrieb Steven Rostedt:
> > On Fri, Aug 07, 2020 at 03:36:58PM +0200, Christian König wrote:  
> >> Trace something useful instead of the pid of a kernel thread here.
> >>
> >> Signed-off-by: Christian König 
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 1 +
> >>   1 file changed, 1 insertion(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> >> index 5da20fc166d9..07f99ef69d91 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> >> @@ -228,6 +228,7 @@ TRACE_EVENT(amdgpu_vm_grab_id,
> >> ),
> >>   
> >>TP_fast_assign(
> >> + __entry->ent.pid = vm->task_info.pid;  
> > If the ent.pid is not the pid you are interested in for this trace event, 
> > just
> > add a "pid" field to the trace event and place it there. Do not modify the
> > generic pid that is recorded, as we would like that to be consistent for all
> > trace events.  
> 
> The problem my userspace guys have is that this doesn't work with 
> "trace-cmd -P $pid".
> 
> But I think I can teach them how filters work :)

Yep, trace-cmd record -e event -f "pid == $pid"

> 
> > The "ent.pid" turns into "common_pid" in the field, leaving "pid" free to 
> > use.
> > Other trace events (like sched_waking) record a pid field that is not the 
> > same
> > as the pid of the executing task.  
> 
> Yes, we thought about this alternative as well.
> 
> > The "ent.pid" should always be the pid of the task that executed the event. 
> >  
> 
> Why? For the case here we just execute a work item in the background for 
> an userspace process.
> 
> Tracing the pid of the worker pool which executes it doesn't seem to 
> make to much sense.

Maybe not for you, but it does for me. All trace events show what
happened when it happened and who executed it. I like to see what
worker threads are executing. I may filter on the worker thread, and by
changing the ent.pid, I wont see what it is doing.

That said, I think I may add a feature to a trace evnt for a special filter
to say, "test field to the set_event_pid", and if it exists in that
file to include that event in the filtered trace. This would include
sched_waking trace events as well.

That way "trace-cmd record -P $pid" will still work for your case.

-- Steve
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: adjust the pid in the grab_id trace point

2020-08-12 Thread Steven Rostedt
On Fri, Aug 07, 2020 at 03:36:58PM +0200, Christian König wrote:
> Trace something useful instead of the pid of a kernel thread here.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> index 5da20fc166d9..07f99ef69d91 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> @@ -228,6 +228,7 @@ TRACE_EVENT(amdgpu_vm_grab_id,
>),
>  
>   TP_fast_assign(
> +__entry->ent.pid = vm->task_info.pid;

If the ent.pid is not the pid you are interested in for this trace event, just
add a "pid" field to the trace event and place it there. Do not modify the
generic pid that is recorded, as we would like that to be consistent for all
trace events.

The "ent.pid" turns into "common_pid" in the field, leaving "pid" free to use.
Other trace events (like sched_waking) record a pid field that is not the same
as the pid of the executing task.

The "ent.pid" should always be the pid of the task that executed the event.

-- Steve


>  __entry->pasid = vm->pasid;
>  __assign_str(ring, ring->name)
>  __entry->vmid = job->vmid;
> -- 
> 2.17.1
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v13 04/20] mm, arm64: untag user pointers passed to memory syscalls

2019-03-28 Thread Steven Rostedt
On Thu, 28 Mar 2019 19:10:07 +0100
Andrey Konovalov  wrote:

> > > Signed-off-by: Andrey Konovalov 
> > > ---
> > >  ipc/shm.c  | 2 ++
> > >  mm/madvise.c   | 2 ++
> > >  mm/mempolicy.c | 5 +
> > >  mm/migrate.c   | 1 +
> > >  mm/mincore.c   | 2 ++
> > >  mm/mlock.c | 5 +
> > >  mm/mmap.c  | 7 +++
> > >  mm/mprotect.c  | 1 +
> > >  mm/mremap.c| 2 ++
> > >  mm/msync.c | 2 ++
> > >  10 files changed, 29 insertions(+)  
> >
> > I wonder whether it's better to keep these as wrappers in the arm64
> > code.  
> 
> I don't think I understand what you propose, could you elaborate?

I believe Catalin is saying that instead of placing things like:

@@ -1593,6 +1593,7 @@ SYSCALL_DEFINE3(shmat, int, shmid, char __user *, 
shmaddr, int, shmflg)
unsigned long ret;
long err;
 
+   shmaddr = untagged_addr(shmaddr);

To instead have the shmaddr set to the untagged_addr() before calling
the system call, and passing the untagged addr to the system call, as
that goes through the arm64 architecture specific code first.

-- Steve
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [trivial PATCH V2] treewide: Align function definition open/close braces

2018-03-22 Thread Steven Rostedt
On Wed, 21 Mar 2018 15:09:32 -0700
Joe Perches  wrote:

> diff --git a/kernel/trace/trace_printk.c b/kernel/trace/trace_printk.c
> index ad1d6164e946..50f44b7b2b32 100644
> --- a/kernel/trace/trace_printk.c
> +++ b/kernel/trace/trace_printk.c
> @@ -196,7 +196,7 @@ struct notifier_block module_trace_bprintk_format_nb = {
>  };
>  
>  int __trace_bprintk(unsigned long ip, const char *fmt, ...)
> - {
> +{
>   int ret;
>   va_list ap;
>  
> @@ -214,7 +214,7 @@ int __trace_bprintk(unsigned long ip, const char *fmt, 
> ...)
>  EXPORT_SYMBOL_GPL(__trace_bprintk);
>  
>  int __ftrace_vbprintk(unsigned long ip, const char *fmt, va_list ap)
> - {
> +{
>   if (unlikely(!fmt))
>   return 0;
>  

Acked-by: Steven Rostedt (VMware) 

-- Steve
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx