On Thu, 1 Feb 2024 13:21:37 +0100
Ahmad Fatoum <a.fat...@pengutronix.de> wrote:

> Hello,
> 
> I semi-regularly debug probe failures. For drivers that use dev_err_probe
> rigorously, this is a quick matter: The probe function records a deferral 
> reason
> and if the deferral persists, deferred_probe_timeout_work_func() will print
> the collected reasons, even if PID 1 is never started.
> 
> For drivers that don't call dev_err_probe, I find myself sometimes doing 
> printf
> debugging inside the probe function.

Is the driver built in or started after init?

> 
> I would like to replace this with the function graph tracer:
> 
>   - record the probe function, configured over kernel command line
>     (The device indefinitely deferring probe is printed to the console,
>      so I know what I am looking for on the next boot)
> 
>   - Dump the function graph trace
> 
>   - See if the last call before (non-devm) cleanup is getting a clock, a GPIO,
>     a regulator or w/e.
> 
> For this to be maximally useful, I need to configure this not only at 
> boot-time,
> but also dump the ftrace buffer at boot time. Probe deferral can hinder the 
> kernel from
> calling init and providing a shell, where I could read 
> /sys/kernel/tracing/trace.

OK so the driver is built in.

> 
> I found following two mechanisms that looked relevant, but seem not to
> do exactly what I want:
> 
>   - tp_printk: seems to be related to trace points only and not usable
>     for the function graph output
> 
>   - dump_on_oops: I don't get an Oops if probe deferral times out, but maybe
>     one could patch the kernel to check a oops_on_probe_deferral or 
> dump_on_probe_deferral
>     kernel command line parameter in deferred_probe_timeout_work_func()?
> 
> 
> Is there existing support that I am missing? Any input on whether this
> would be a welcome feature to have?

Well you can start function_graph on the kernel command line and event
filter on a give function

 ftrace=function_graph function_graph_filter=probe_func

You can add your own ftrace_dump() on some kind of detected error and put
that in the kernel command line. For example RCU has:

  rcupdate.rcu_cpu_stall_ftrace_dump=

Which will do a ftrace dump when a RCU stall is triggered.

-- Steve


Reply via email to